Recipients Luis Antonio Benítez, Carolina Cuesta-Lazaro, and Fernando Romero López receive support for their scientific research.
( 6
min )
ONNX (Open Neural Network Exchange) is an open-source standard for representing deep learning models widely supported by many providers. ONNX provides tools for optimizing and quantizing models to reduce the memory and compute needed to run machine learning (ML) models. One of the biggest benefits of ONNX is that it provides a standardized format for […]
( 14
min )
We are excited to announce the open-source release of GraphStorm 0.1, a low-code enterprise graph machine learning (ML) framework to build, train, and deploy graph ML solutions on complex enterprise-scale graphs in days instead of months. With GraphStorm, you can build solutions that directly take into account the structure of relationships or interactions between billions […]
( 9
min )
Microsoft publicly endorsed Open AI, with ‘Copilot’ embedded in every single bit of the Microsoft stack. Behind the scenes, with everything closed source, nobody knew if these AI assistants were driven by Cortana, Bing, or Open AI. The assistant technology is not new, and other than code generation and assisted writing, some wonder what value… Read More »The unannounced next-level partnership between Microsoft and Databricks
The post The unannounced next-level partnership between Microsoft and Databricks appeared first on Data Science Central.
( 22
min )
High-quality app development can significantly drive your business growth and success while boosting customer satisfaction and bringing in more clients. However, with millions of apps existing in the market, standing out from the competition requires more than just a great idea and an appealing design. Data engineering is what can help you, playing a pivotal… Read More »The Importance of Data Engineering for a Profitable App Development
The post The Importance of Data Engineering for a Profitable App Development appeared first on Data Science Central.
( 21
min )
When meteor showers occur every few months, viewers get to watch a dazzling scene of shooting stars and light streaks scattering across the night sky. Normally, meteors are just small pieces of rock and dust from space that quickly burn up upon entering Earth’s atmosphere. But the story would take a darker turn if a Read article >
( 7
min )
Mixture-of-Expert (MoE) models have obtained state-of-the-art performance in
Neural Machine Translation (NMT) tasks. Existing works in MoE mostly consider a
homogeneous design where the same number of experts of the same size are placed
uniformly throughout the network. Furthermore, existing MoE works do not
consider computational constraints (e.g., FLOPs, latency) to guide their
design. To this end, we develop AutoMoE -- a framework for designing
heterogeneous MoE's under computational constraints. AutoMoE leverages Neural
Architecture Search (NAS) to obtain efficient sparse MoE sub-transformers with
4x inference speedup (CPU) and FLOPs reduction over manually designed
Transformers, with parity in BLEU score over dense Transformer and within 1
BLEU point of MoE SwitchTransformer, on aggregate over benchmark datasets for
NMT. Heterogeneous search space with dense and sparsely activated Transformer
modules (e.g., how many experts? where to place them? what should be their
sizes?) allows for adaptive compute -- where different amounts of computations
are used for different tokens in the input. Adaptivity comes naturally from
routing decisions which send tokens to experts of different sizes. AutoMoE
code, data, and trained models are available at https://aka.ms/AutoMoE.
( 2
min )
This paper presents a novel Sequence-to-Sequence (Seq2Seq) model based on a
transformer-based attention mechanism and temporal pooling for Non-Intrusive
Load Monitoring (NILM) of smart buildings. The paper aims to improve the
accuracy of NILM by using a deep learning-based method. The proposed method
uses a Seq2Seq model with a transformer-based attention mechanism to capture
the long-term dependencies of NILM data. Additionally, temporal pooling is used
to improve the model's accuracy by capturing both the steady-state and
transient behavior of appliances. The paper evaluates the proposed method on a
publicly available dataset and compares the results with other state-of-the-art
NILM techniques. The results demonstrate that the proposed method outperforms
the existing methods in terms of both accuracy and computational efficiency.
( 2
min )
In recent years, indoor human presence detection based on supervised learning
(SL) and channel state information (CSI) has attracted much attention. However,
existing studies that rely on spatial information of CSI are susceptible to
environmental changes which degrade prediction accuracy. Moreover, SL-based
methods require time-consuming data labeling for retraining models. Therefore,
it is imperative to design a continuously monitored model using a
semi-supervised learning (SSL) based scheme. In this paper, we conceive a
bifold teacher-student (BTS) learning approach for indoor human presence
detection in an adjoining two-room scenario. The proposed SSL-based primal-dual
teacher-student network intelligently learns spatial and temporal features from
labeled and unlabeled CSI datasets. Additionally, the enhanced penalized loss
function leverages entropy and distance measures to distinguish drifted data,
i.e., features of new datasets affected by time-varying effects and altered
from the original distribution. Experimental results demonstrate that the
proposed BTS system sustains asymptotic accuracy after retraining the model
with unlabeled data. Furthermore, BTS outperforms existing SSL-based models in
terms of the highest detection accuracy while achieving the asymptotic
performance of SL-based methods.
( 2
min )
The use of Shap scores has become widespread in Explainable AI. However,
their computation is in general intractable, in particular when done with a
black-box classifier, such as neural network. Recent research has unveiled
classes of open-box Boolean Circuit classifiers for which Shap can be computed
efficiently. We show how to transform binary neural networks into those
circuits for efficient Shap computation. We use logic-based knowledge
compilation techniques. The performance gain is huge, as we show in the light
of our experiments.
( 2
min )
We have recently witnessed a number of impressive results on hard
mathematical reasoning problems with language models. At the same time, the
robustness of these models has also been called into question; recent works
have shown that models can rely on shallow patterns in the problem description
when generating a solution. Building on the idea of behavioral testing, we
propose a novel framework, which pins down the causal effect of various factors
in the input, e.g., the surface form of the problem text, the operands, and
math operators on the output solution. By grounding the behavioral analysis in
a causal graph describing an intuitive reasoning process, we study the behavior
of language models in terms of robustness and sensitivity to direct
interventions in the input space. We apply our framework on a test bed of math
word problems. Our analysis shows that robustness does not appear to
continuously improve as a function of size, but the GPT-3 Davinci models (175B)
achieve a dramatic improvement in both robustness and sensitivity compared to
all other GPT variants.
( 2
min )
Despite the intense attention and considerable investment into clinical
machine learning research, relatively few applications have been deployed at a
large-scale in a real-world clinical environment. While research is important
in advancing the state-of-the-art, translation is equally important in bringing
these techniques and technologies into a position to ultimately impact
healthcare. We believe a lack of appreciation for several considerations are a
major cause for this discrepancy between expectation and reality. To better
characterize a holistic perspective among researchers and practitioners, we
survey several practitioners with commercial experience in developing CML for
clinical deployment. Using these insights, we identify several main categories
of challenges in order to better design and develop clinical machine learning
applications.
( 2
min )
Denoising is intuitively related to projection. Indeed, under the manifold
hypothesis, adding random noise is approximately equivalent to orthogonal
perturbation. Hence, learning to denoise is approximately learning to project.
In this paper, we use this observation to reinterpret denoising diffusion
models as approximate gradient descent applied to the Euclidean distance
function. We then provide straight-forward convergence analysis of the DDIM
sampler under simple assumptions on the projection-error of the denoiser.
Finally, we propose a new sampler based on two simple modifications to DDIM
using insights from our theoretical results. In as few as 5-10 function
evaluations, our sampler achieves state-of-the-art FID scores on pretrained
CIFAR-10 and CelebA models and can generate high quality samples on latent
diffusion models.
( 2
min )
Privately generating synthetic data from a table is an important brick of a
privacy-first world. We propose and investigate a simple approach of treating
each row in a table as a sentence and training a language model with
differential privacy. We show this approach obtains competitive results in
modelling tabular data across multiple datasets, even at small scales that
favor alternative methods based on marginal distributions.
( 2
min )
We propose a new approach to constructing a neural network for predicting
expectations of stochastic differential equations. The proposed method does not
need data sets of inputs and outputs; instead, the information obtained from
the time-evolution equations, i.e., the corresponding dual process, is directly
compared with the weights in the neural network. As a demonstration, we
construct neural networks for the Ornstein-Uhlenbeck process and the noisy van
der Pol system. The remarkable feature of learned networks with the proposed
method is the accuracy of inputs near the origin. Hence, it would be possible
to avoid the overfitting problem because the learned network does not depend on
training data sets.
( 2
min )
The estimation of causal effects is a primary goal of behavioral, social,
economic and biomedical sciences. Under the unconfoundedness condition,
adjustment for confounders requires estimating the nuisance functions relating
outcome and/or treatment to confounders. This paper considers a generalized
optimization framework for efficient estimation of general treatment effects
using feedforward artificial neural networks (ANNs) when the number of
covariates is allowed to increase with the sample size. We estimate the
nuisance function by ANNs, and develop a new approximation error bound for the
ANNs approximators when the nuisance function belongs to a mixed Sobolev space.
We show that the ANNs can alleviate the curse of dimensionality under this
circumstance. We further establish the consistency and asymptotic normality of
the proposed treatment effects estimators, and apply a weighted bootstrap
procedure for conducting inference. The proposed methods are illustrated via
simulation studies and a real data application.
( 2
min )
We study the mean estimation problem under communication and local
differential privacy constraints. While previous work has proposed
\emph{order}-optimal algorithms for the same problem (i.e., asymptotically
optimal as we spend more bits), \emph{exact} optimality (in the non-asymptotic
setting) still has not been achieved. In this work, we take a step towards
characterizing the \emph{exact}-optimal approach in the presence of shared
randomness (a random variable shared between the server and the user) and
identify several necessary conditions for \emph{exact} optimality. We prove
that one of the necessary conditions is to utilize a rotationally symmetric
shared random codebook. Based on this, we propose a randomization mechanism
where the codebook is a randomly rotated simplex -- satisfying the necessary
properties of the \emph{exact}-optimal codebook. The proposed mechanism is
based on a $k$-closest encoding which we prove to be \emph{exact}-optimal for
the randomly rotated simplex codebook.
( 2
min )
Data scientists need a consistent and reproducible environment for machine learning (ML) and data science workloads that enables managing dependencies and is secure. AWS Deep Learning Containers already provides pre-built Docker images for training and serving models in common frameworks such as TensorFlow, PyTorch, and MXNet. To improve this experience, we announced a public beta […]
( 8
min )
Customers expect quick and efficient service from businesses in today’s fast-paced world. But providing excellent customer service can be significantly challenging when the volume of inquiries outpaces the human resources employed to address them. However, businesses can meet this challenge while providing personalized and efficient customer service with the advancements in generative artificial intelligence (generative […]
( 11
min )
Amazon Personalize now enables popularity tuning for its Similar-Items recipe (aws-similar-items). Similar-Items generates recommendations that are similar to the item that a user selects, helping users discover new items in your catalog based on the previous behavior of all users and item metadata. Previously, this capability was only available for SIMS, the other Related_Items recipe […]
( 5
min )
By applying a language model to protein-drug interactions, researchers can quickly screen large libraries of potential drug compounds.
( 9
min )
The scientists used a natural language-based logical inference dataset to create smaller language models that outperformed much larger counterparts.
( 9
min )
Get into your favorite games faster by linking GeForce NOW to Steam, Epic Games Store and Ubisoft accounts. And get a peek at more games coming to GeForce NOW later this year by tuning in to Ubisoft Forward on Monday, June 12, when the game publisher will reveal its latest news and announcements. Plus, two Read article >
( 5
min )
Emre Kiciman and Amit Sharma join Ashley Llorens to discuss the causal capabilities of LLMs and ongoing journeys with GPT-3.5 and GPT-4 in the newest episode of the Microsoft Research Podcast series, "AI Frontiers."
The post AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma appeared first on Microsoft Research.
( 30
min )
We introduce a randomized topological augmentor based on Schur complements
for Graph Contrastive Learning (GCL). Given a graph laplacian matrix, the
technique generates unbiased approximations of its Schur complements and treats
the corresponding graphs as augmented views. We discuss the benefits of our
approach, provide theoretical justifications and present connections with graph
diffusion. Unlike previous efforts, we study the empirical effectiveness of the
augmentor in a controlled fashion by varying the design choices for subsequent
GCL phases, such as encoding and contrasting. Extensive experiments on node and
graph classification benchmarks demonstrate that our technique consistently
outperforms pre-defined and adaptive augmentation approaches to achieve
state-of-the-art results.
( 2
min )
Efficient large-scale neural network training and inference on commodity CPU
hardware is of immense practical significance in democratizing deep learning
(DL) capabilities. Presently, the process of training massive models consisting
of hundreds of millions to billions of parameters requires the extensive use of
specialized hardware accelerators, such as GPUs, which are only accessible to a
limited number of institutions with considerable financial resources. Moreover,
there is often an alarming carbon footprint associated with training and
deploying these models. In this paper, we take a step towards addressing these
challenges by introducing BOLT, a sparse deep learning library for training
large-scale search and recommendation models on standard CPU hardware. BOLT
provides a flexible, high-level API for constructing models that will be
familiar to users of existing popular DL frameworks. By automatically tuning
specialized hyperparameters, BOLT also abstracts away the algorithmic details
of sparse network training. We evaluate BOLT on a number of information
retrieval tasks including product recommendations, text classification, graph
neural networks, and personalization. We find that our proposed system achieves
competitive performance with state-of-the-art techniques at a fraction of the
cost and energy consumption and an order-of-magnitude faster inference time.
BOLT has also been successfully deployed by multiple businesses to address
critical problems, and we highlight one customer deployment case study in the
field of e-commerce.
( 3
min )
In a context of malicious software detection, machine learning (ML) is widely
used to generalize to new malware. However, it has been demonstrated that ML
models can be fooled or may have generalization problems on malware that has
never been seen. We investigate the possible benefits of quantum algorithms for
classification tasks. We implement two models of Quantum Machine Learning
algorithms, and we compare them to classical models for the classification of a
dataset composed of malicious and benign executable files. We try to optimize
our algorithms based on methods found in the literature, and analyze our
results in an exploratory way, to identify the most interesting directions to
explore for the future.
( 2
min )
This paper proposes Meta-SAGE, a novel approach for improving the scalability
of deep reinforcement learning models for combinatorial optimization (CO)
tasks. Our method adapts pre-trained models to larger-scale problems in test
time by suggesting two components: a scale meta-learner (SML) and scheduled
adaptation with guided exploration (SAGE). First, SML transforms the context
embedding for subsequent adaptation of SAGE based on scale information. Then,
SAGE adjusts the model parameters dedicated to the context embedding for a
specific instance. SAGE introduces locality bias, which encourages selecting
nearby locations to determine the next location. The locality bias gradually
decays as the model is adapted to the target instance. Results show that
Meta-SAGE outperforms previous adaptation methods and significantly improves
scalability in representative CO tasks. Our source code is available at
https://github.com/kaist-silab/meta-sage
( 2
min )
Computer vision applications in transportation logistics and warehousing have
a huge potential for process automation. We present a structured literature
review on research in the field to help leverage this potential. The literature
is categorized w.r.t. the application, i.e. the task it tackles and w.r.t. the
computer vision techniques that are used. Regarding applications, we subdivide
the literature in two areas: Monitoring, i.e. observing and retrieving relevant
information from the environment, and manipulation, where approaches are used
to analyze and interact with the environment. Additionally, we point out
directions for future research and link to recent developments in computer
vision that are suitable for application in logistics. Finally, we present an
overview of existing datasets and industrial solutions. The results of our
analysis are also available online at https://a-nau.github.io/cv-in-logistics.
( 2
min )
Large language models (LLMs) with memory are computationally universal.
However, mainstream LLMs are not taking full advantage of memory, and the
designs are heavily influenced by biological brains. Due to their approximate
nature and proneness to the accumulation of errors, conventional neural memory
mechanisms cannot support LLMs to simulate complex reasoning. In this paper, we
seek inspiration from modern computer architectures to augment LLMs with
symbolic memory for complex multi-hop reasoning. Such a symbolic memory
framework is instantiated as an LLM and a set of SQL databases, where the LLM
generates SQL instructions to manipulate the SQL databases. We validate the
effectiveness of the proposed memory framework on a synthetic dataset requiring
complex reasoning. The project website is available at
https://chatdatabase.github.io/ .
( 2
min )
Automatic speech recognition (ASR) models are frequently exposed to data
distribution shifts in many real-world scenarios, leading to erroneous
predictions. To tackle this issue, an existing test-time adaptation (TTA)
method has recently been proposed to adapt the pre-trained ASR model on
unlabeled test instances without source data. Despite decent performance gain,
this work relies solely on naive greedy decoding and performs adaptation across
timesteps at a frame level, which may not be optimal given the sequential
nature of the model output. Motivated by this, we propose a novel TTA
framework, dubbed SGEM, for general ASR models. To treat the sequential output,
SGEM first exploits beam search to explore candidate output logits and selects
the most plausible one. Then, it utilizes generalized entropy minimization and
negative sampling as unsupervised objectives to adapt the model. SGEM achieves
state-of-the-art performance for three mainstream ASR models under various
domain shifts.
( 2
min )
Ultrasound imaging is one of the most prominent technologies to evaluate the
growth, progression, and overall health of a fetus during its gestation.
However, the interpretation of the data obtained from such studies is best left
to expert physicians and technicians who are trained and well-versed in
analyzing such images. To improve the clinical workflow and potentially develop
an at-home ultrasound-based fetal monitoring platform, we present a novel fetus
phantom ultrasound dataset, FPUS23, which can be used to identify (1) the
correct diagnostic planes for estimating fetal biometric values, (2) fetus
orientation, (3) their anatomical features, and (4) bounding boxes of the fetus
phantom anatomies at 23 weeks gestation. The entire dataset is composed of
15,728 images, which are used to train four different Deep Neural Network
models, built upon a ResNet34 backbone, for detecting aforementioned fetus
features and use-cases. We have also evaluated the models trained using our
FPUS23 dataset, to show that the information learned by these models can be
used to substantially increase the accuracy on real-world ultrasound fetus
datasets. We make the FPUS23 dataset and the pre-trained models publicly
accessible at https://github.com/bharathprabakaran/FPUS23, which will further
facilitate future research on fetal ultrasound imaging and analysis.
( 3
min )
Control variates can be a powerful tool to reduce the variance of Monte Carlo
estimators, but constructing effective control variates can be challenging when
the number of samples is small. In this paper, we show that when a large number
of related integrals need to be computed, it is possible to leverage the
similarity between these integration tasks to improve performance even when the
number of samples per task is very small. Our approach, called meta learning
CVs (Meta-CVs), can be used for up to hundreds or thousands of tasks. Our
empirical assessment indicates that Meta-CVs can lead to significant variance
reduction in such settings, and our theoretical analysis establishes general
conditions under which Meta-CVs can be successfully trained.
( 2
min )
Deciding how to optimally deploy sensors in a large, complex, and spatially
extended structure is critical to ensure that the surface pressure field is
accurately captured for subsequent analysis and design. In some cases,
reconstruction of missing data is required in downstream tasks such as the
development of digital twins. This paper presents a data-driven sparse sensor
selection algorithm, aiming to provide the most information contents for
reconstructing aerodynamic characteristics of wind pressures over tall building
structures parsimoniously. The algorithm first fits a set of basis functions to
the training data, then applies a computationally efficient QR algorithm that
ranks existing pressure sensors in order of importance based on the state
reconstruction to this tailored basis. The findings of this study show that the
proposed algorithm successfully reconstructs the aerodynamic characteristics of
tall buildings from sparse measurement locations, generating stable and optimal
solutions across a range of conditions. As a result, this study serves as a
promising first step toward leveraging the success of data-driven and machine
learning algorithms to supplement traditional genetic algorithms currently used
in wind engineering.
( 2
min )
In this paper, we propose a Boosting Tail Neural Network (BTNN) for improving
the performance of Realtime Custom Keyword Spotting (RCKS) that is still an
industrial challenge for demanding powerful classification ability with limited
computation resources. Inspired by Brain Science that a brain is only partly
activated for a nerve simulation and numerous machine learning algorithms are
developed to use a batch of weak classifiers to resolve arduous problems, which
are often proved to be effective. We show that this method is helpful to the
RCKS problem. The proposed approach achieve better performances in terms of
wakeup rate and false alarm.
In our experiments compared with those traditional algorithms that use only
one strong classifier, it gets 18\% relative improvement. We also point out
that this approach may be promising in future ASR exploration.
( 2
min )
AnalogVNN, a simulation framework built on PyTorch which can simulate the
effects of optoelectronic noise, limited precision, and signal normalization
present in photonic neural network accelerators. We use this framework to train
and optimize linear and convolutional neural networks with up to 9 layers and
~1.7 million parameters, while gaining insights into how normalization,
activation function, reduced precision, and noise influence accuracy in analog
photonic neural networks. By following the same layer structure design present
in PyTorch, the AnalogVNN framework allows users to convert most digital neural
network models to their analog counterparts with just a few lines of code,
taking full advantage of the open-source optimization, deep learning, and GPU
acceleration libraries available through PyTorch. Code is available at
https://analogvnn.github.io
( 2
min )
Information on natural phenomena and engineering systems is typically
contained in data. Data can be corrupted by systematic errors in models and
experiments. In this paper, we propose a tool to uncover the spatiotemporal
solution of the underlying physical system by removing the systematic errors
from data. The tool is the physics-constrained convolutional neural network
(PC-CNN), which combines information from both the systems governing equations
and data. We focus on fundamental phenomena that are modelled by partial
differential equations, such as linear convection, Burgers equation, and
two-dimensional turbulence. First, we formulate the problem, describe the
physics-constrained convolutional neural network, and parameterise the
systematic error. Second, we uncover the solutions from data corrupted by large
multimodal systematic errors. Third, we perform a parametric study for
different systematic errors. We show that the method is robust. Fourth, we
analyse the physical properties of the uncovered solutions. We show that the
solutions inferred from the PC-CNN are physical, in contrast to the data
corrupted by systematic errors that does not fulfil the governing equations.
This work opens opportunities for removing epistemic errors from models, and
systematic errors from measurements.
( 2
min )
We study robustness to test-time adversarial attacks in the regression
setting with $\ell_p$ losses and arbitrary perturbation sets. We address the
question of which function classes are PAC learnable in this setting. We show
that classes of finite fat-shattering dimension are learnable in both
realizable and agnostic settings. Moreover, for convex function classes, they
are even properly learnable. In contrast, some non-convex function classes
provably require improper learning algorithms. Our main technique is based on a
construction of an adversarially robust sample compression scheme of a size
determined by the fat-shattering dimension. Along the way, we introduce a novel
agnostic sample compression scheme for real-valued functions, which may be of
independent interest.
( 2
min )
Principal components analysis (PCA) is a fundamental algorithm in data
analysis. Its memory-restricted online versions are useful in many modern
applications, where the data are too large to fit in memory, or when data
arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two
online PCA algorithms that are based on rank-one updates. While ROIPCA is
typically more accurate, fROIPCA is faster and has comparable accuracy. We show
the relation between fROIPCA and an existing popular gradient algorithm for
online PCA, and in particular, prove that fROIPCA is in fact a gradient
algorithm with an optimal learning rate. We demonstrate numerically the
advantages of our algorithms over existing state-of-the-art algorithms in terms
of accuracy and runtime.
( 2
min )
We provide new estimates of an asymptotic upper bound on the entropy of
English using the large language model LLaMA-7B as a predictor for the next
token given a window of past tokens. This estimate is significantly smaller
than currently available estimates in \cite{cover1978convergent},
\cite{lutati2023focus}. A natural byproduct is an algorithm for lossless
compression of English text which combines the prediction from the large
language model with a lossless compression scheme. Preliminary results from
limited experiments suggest that our scheme outperforms state-of-the-art text
compression schemes such as BSC, ZPAQ, and paq8h.
( 2
min )
We present a comprehensive analysis of quantitatively evaluating explainable
artificial intelligence (XAI) techniques for remote sensing image
classification. Our approach leverages state-of-the-art machine learning
approaches to perform remote sensing image classification across multiple
modalities. We investigate the results of the models qualitatively through XAI
methods. Additionally, we compare the XAI methods quantitatively through
various categories of desired properties. Through our analysis, we offer
insights and recommendations for selecting the most appropriate XAI method(s)
to gain a deeper understanding of the models' decision-making processes. The
code for this work is publicly available.
( 2
min )
In this paper, we examine the problem of partial inference in the context of
structured prediction. Using a generative model approach, we consider the task
of maximizing a score function with unary and pairwise potentials in the space
of labels on graphs. Employing a two-stage convex optimization algorithm for
label recovery, we analyze the conditions under which a majority of the labels
can be recovered. We introduce a novel perspective on the Karush-Kuhn-Tucker
(KKT) conditions and primal and dual construction, and provide statistical and
topological requirements for partial recovery with provable guarantees.
( 2
min )
The Japanese writing system is complex, with three character types of
Hiragana, Katakana, and Kanji. Kanji consists of thousands of unique
characters, further adding to the complexity of character identification and
literature understanding. Being able to translate handwritten Japanese
characters into digital text is useful for data analysis, translation, learning
and cultural preservation. In this study, a machine learning approach to
analyzing and recognizing handwritten Japanese characters (Kanji) is proposed.
The study used an ensemble of three convolutional neural networks (CNNs) for
recognizing handwritten Kanji characters and utilized four datasets of MNIST,
K-MNIST, Kuzushiji-49 (K49) and the top 150 represented classes in the
Kuzushiji-Kanji (K-Kanji) dataset for its performance evaluation. The results
indicate feasibility of using proposed CNN-ensemble architecture for
recognizing handwritten characters, achieving 99.4%, 96.4%, 95.0% and 96.4%
classification accuracy on MNIST, K-MNIS, K49, and K-Kanji datasets
respectively.
( 2
min )
Approximate inference in Gaussian process (GP) models with non-conjugate
likelihoods gets entangled with the learning of the model hyperparameters. We
improve hyperparameter learning in GP models and focus on the interplay between
variational inference (VI) and the learning target. While VI's lower bound to
the marginal likelihood is a suitable objective for inferring the approximate
posterior, we show that a direct approximation of the marginal likelihood as in
Expectation Propagation (EP) is a better learning objective for hyperparameter
optimization. We design a hybrid training procedure to bring the best of both
worlds: it leverages conjugate-computation VI for inference and uses an EP-like
marginal likelihood approximation for hyperparameter learning. We compare VI,
EP, Laplace approximation, and our proposed training procedure and empirically
demonstrate the effectiveness of our proposal across a wide range of data sets.
( 2
min )
Despite progress in the field, significant parts of current XAI research are
still not on solid conceptual, ethical, or methodological grounds.
Unfortunately, these unfounded parts are not on the decline but continue to
grow. Many explanation techniques are still proposed without clarifying their
purpose. Instead, they are advertised with ever more fancy-looking heatmaps or
only seemingly relevant benchmarks. Moreover, explanation techniques are
motivated with questionable goals, such as building trust, or rely on strong
assumptions about the 'concepts' that deep learning algorithms learn. In this
paper, we highlight and discuss these and other misconceptions in current XAI
research. We also suggest steps to make XAI a more substantive area of
research.
( 2
min )
We show how to "compile" human-readable programs into standard decoder-only
transformer models. Our compiler, Tracr, generates models with known structure.
This structure can be used to design experiments. For example, we use it to
study "superposition" in transformers that execute multi-step algorithms.
Additionally, the known structure of Tracr-compiled models can serve as
ground-truth for evaluating interpretability methods. Commonly, because the
"programs" learned by transformers are unknown it is unclear whether an
interpretation succeeded. We demonstrate our approach by implementing and
examining programs including computing token frequencies, sorting, and
parenthesis checking. We provide an open-source implementation of Tracr at
https://github.com/deepmind/tracr.
( 2
min )
We study robustness to test-time adversarial attacks in the regression
setting with $\ell_p$ losses and arbitrary perturbation sets. We address the
question of which function classes are PAC learnable in this setting. We show
that classes of finite fat-shattering dimension are learnable in both
realizable and agnostic settings. Moreover, for convex function classes, they
are even properly learnable. In contrast, some non-convex function classes
provably require improper learning algorithms. Our main technique is based on a
construction of an adversarially robust sample compression scheme of a size
determined by the fat-shattering dimension. Along the way, we introduce a novel
agnostic sample compression scheme for real-valued functions, which may be of
independent interest.
( 2
min )
It is often very challenging to manually design reward functions for complex,
real-world tasks. To solve this, one can instead use reward learning to infer a
reward function from data. However, there are often multiple reward functions
that fit the data equally well, even in the infinite-data limit. This means
that the reward function is only partially identifiable. In this work, we
formally characterise the partial identifiability of the reward function given
several popular reward learning data sources, including expert demonstrations
and trajectory comparisons. We also analyse the impact of this partial
identifiability for several downstream tasks, such as policy optimisation. We
unify our results in a framework for comparing data sources and downstream
tasks by their invariances, with implications for the design and selection of
data sources for reward learning.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of structured symmetric positive-definite matrices
with the affine-invariant metric. We do so by proposing a generalized version
of the Riemannian normal coordinates that dynamically orthonormalizes the
metric and locally converts the problem into an unconstrained problem in the
Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning in low precision settings.
Code: https://github.com/yorkerlin/StructuredNGD-DL
( 2
min )
Principal components analysis (PCA) is a fundamental algorithm in data
analysis. Its memory-restricted online versions are useful in many modern
applications, where the data are too large to fit in memory, or when data
arrive as a stream of items. In this paper, we propose ROIPCA and fROIPCA, two
online PCA algorithms that are based on rank-one updates. While ROIPCA is
typically more accurate, fROIPCA is faster and has comparable accuracy. We show
the relation between fROIPCA and an existing popular gradient algorithm for
online PCA, and in particular, prove that fROIPCA is in fact a gradient
algorithm with an optimal learning rate. We demonstrate numerically the
advantages of our algorithms over existing state-of-the-art algorithms in terms
of accuracy and runtime.
( 2
min )
Prediction models are typically optimized independently from decision
optimization. A smart predict then optimize (SPO) framework optimizes
prediction models to minimize downstream decision regret. In this paper we
present dboost, the first general purpose implementation of smart gradient
boosting for `predict, then optimize' problems. The framework supports convex
quadratic cone programming and gradient boosting is performed by implicit
differentiation of a custom fixed-point mapping. Experiments comparing with
state-of-the-art SPO methods show that dboost can further reduce out-of-sample
decision regret.
( 2
min )
Empirical neural tangent kernels (eNTKs) can provide a good understanding of
a given network's representation: they are often far less expensive to compute
and applicable more broadly than infinite width NTKs. For networks with O
output units (e.g. an O-class classifier), however, the eNTK on N inputs is of
size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$
computation. Most existing applications have therefore used one of a handful of
approximations yielding $N \times N$ kernel matrices, saving orders of
magnitude of computation, but with limited to no justification. We prove that
one such approximation, which we call "sum of logits", converges to the true
eNTK at initialization for any network with a wide final "readout" layer. Our
experiments demonstrate the quality of this approximation for various uses
across a range of settings.
( 2
min )
We propose a novel Bayesian inference framework for distributed
differentially private linear regression. We consider a distributed setting
where multiple parties hold parts of the data and share certain summary
statistics of their portions in privacy-preserving noise. We develop a novel
generative statistical model for privately shared statistics, which exploits a
useful distributional relation between the summary statistics of linear
regression. Bayesian estimation of the regression coefficients is conducted
mainly using Markov chain Monte Carlo algorithms, while we also provide a fast
version to perform Bayesian estimation in one iteration. The proposed methods
have computational advantages over their competitors. We provide numerical
results on both real and simulated data, which demonstrate that the proposed
algorithms provide well-rounded estimation and prediction.
( 2
min )
Training large language models (LLMs) with billions of parameters can be challenging. In addition to designing the model architecture, researchers need to set up state-of-the-art training techniques for distributed training like mixed precision support, gradient accumulation, and checkpointing. With large models, the training setup is even more challenging because the available memory in a single […]
( 7
min )
You can now retrain machine learning (ML) models and automate batch prediction workflows with updated datasets in Amazon SageMaker Canvas, thereby making it easier to constantly learn and improve the model performance and drive efficiency. An ML model’s effectiveness depends on the quality and relevance of the data it’s trained on. As time progresses, the […]
( 10
min )
Amazon Lex is excited to announce Test Workbench, a new bot testing solution that provides tools to simplify and automate the bot testing process. During bot development, testing is the phase where developers check whether a bot meets the specific requirements, needs and expectations by identifying errors, defects, or bugs in the system before scaling. […]
( 9
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. Amazon Textract has a Tables feature within the AnalyzeDocument API that offers the ability to automatically extract tabular structures from any document. In this post, we discuss the improvements made to the Tables feature and […]
( 9
min )
This blog post is co-written with Dr. Ebtesam Almazrouei, Executive Director–Acting Chief AI Researcher of the AI-Cross Center Unit and Project Lead for LLM Projects at TII. United Arab Emirate’s (UAE) Technology Innovation Institute (TII), the applied research pillar of Abu Dhabi’s Advanced Technology Research Council, has launched Falcon LLM, a foundational large language model […]
( 10
min )
In the latest episode of NVIDIA’s AI Podcast, Anant Agarwal, founder of edX and chief platform officer at 2U, shared his vision for the future of online education and how AI is revolutionizing the learning experience. Agarwal, a strong advocate for massive open online courses, or MOOCs, discussed the importance of accessibility and quality in Read article >
( 4
min )
Getting discharged from the hospital is a major milestone for patients — but sometimes, it’s not the end of their road to recovery. Nearly 15% of hospital patients in the U.S. are readmitted within 30 days of their initial discharge, which is often associated with worse outcomes and higher costs for both patients and hospitals. Read article >
( 6
min )
In this issue: Peter Lee discusses AI in medicine. Plus, new research on data inference privacy in machine learning; PII leakage in language models; and automatic prompt organization with gradient descent and beam search.
The post Research Focus: Week of June 5, 2023 appeared first on Microsoft Research.
( 11
min )
Can you imagine a world where healthcare is more accessible, affordable, and efficient? Conversational AI is making this vision a reality. With the help of natural language processing (NLP) and machine learning (ML), conversational AI is transforming the way healthcare providers interact with patients. From scheduling appointments to monitoring health conditions, conversational AI has numerous… Read More »The impact of conversational AI on healthcare outcomes and patient satisfaction
The post The impact of conversational AI on healthcare outcomes and patient satisfaction appeared first on Data Science Central.
( 22
min )
In recent years, the web development industry has shifted towards Progressive Web Apps (PWAs) as the future of web development. PWAs are web applications that provide users with an app-like experience on their mobile devices. They do not have to download or install a separate native app. This emerging technology provides several benefits, including faster… Read More »Why are progressive web apps becoming the future of web development?
The post Why are progressive web apps becoming the future of web development? appeared first on Data Science Central.
( 22
min )
It has been reported that clustering-based topic models, which cluster
high-quality sentence embeddings with an appropriate word selection method, can
generate better topics than generative probabilistic topic models. However,
these approaches suffer from the inability to select appropriate parameters and
incomplete models that overlook the quantitative relation between words with
topics and topics with text. To solve these issues, we propose graph to topic
(G2T), a simple but effective framework for topic modelling. The framework is
composed of four modules. First, document representation is acquired using
pretrained language models. Second, a semantic graph is constructed according
to the similarity between document representations. Third, communities in
document semantic graphs are identified, and the relationship between topics
and documents is quantified accordingly. Fourth, the word--topic distribution
is computed based on a variant of TFIDF. Automatic evaluation suggests that G2T
achieved state-of-the-art performance on both English and Chinese documents
with different lengths.
( 2
min )
We propose causal isotonic calibration, a novel nonparametric method for
calibrating predictors of heterogeneous treatment effects. Furthermore, we
introduce cross-calibration, a data-efficient variant of calibration that
eliminates the need for hold-out calibration sets. Cross-calibration leverages
cross-fitted predictors and generates a single calibrated predictor using all
available data. Under weak conditions that do not assume monotonicity, we
establish that both causal isotonic calibration and cross-calibration achieve
fast doubly-robust calibration rates, as long as either the propensity score or
outcome regression is estimated accurately in a suitable sense. The proposed
causal isotonic calibrator can be wrapped around any black-box learning
algorithm, providing robust and distribution-free calibration guarantees while
preserving predictive performance.
( 2
min )
We introduce Brain-Inspired Modular Training (BIMT), a method for making
neural networks more modular and interpretable. Inspired by brains, BIMT embeds
neurons in a geometric space and augments the loss function with a cost
proportional to the length of each neuron connection. We demonstrate that BIMT
discovers useful modular neural networks for many simple tasks, revealing
compositional structures in symbolic formulas, interpretable decision
boundaries and features for classification, and mathematical structure in
algorithmic datasets. The ability to directly see modules with the naked eye
can complement current mechanistic interpretability strategies such as probes,
interventions or staring at all weights.
( 2
min )
In many machine learning applications, labeling datasets can be an arduous
and time-consuming task. Although research has shown that semi-supervised
learning techniques can achieve high accuracy with very few labels within the
field of computer vision, little attention has been given to how images within
a dataset should be selected for labeling. In this paper, we propose a novel
approach based on well-established self-supervised learning, clustering, and
manifold learning techniques that address this challenge of selecting an
informative image subset to label in the first instance, which is known as the
cold-start or unsupervised selective labelling problem. We test our approach
using several publicly available datasets, namely CIFAR10, Imagenette,
DeepWeeds, and EuroSAT, and observe improved performance with both supervised
and semi-supervised learning strategies when our label selection strategy is
used, in comparison to random sampling. We also obtain superior performance for
the datasets considered with a much simpler approach compared to other methods
in the literature.
( 2
min )
Machine learning techniques are effective for building predictive models
because they identify patterns in large datasets. Development of a model for
complex real-life problems often stop at the point of publication, proof of
concept or when made accessible through some mode of deployment. However, a
model in the medical domain risks becoming obsolete as patient demographics,
systems and clinical practices change. The maintenance and monitoring of
predictive model performance post-publication is crucial to enable their safe
and effective long-term use. We will assess the infrastructure required to
monitor the outputs of a machine learning algorithm, and present two scenarios
with examples of monitoring and updates of models, firstly on a breast cancer
prognosis model trained on public longitudinal data, and secondly on a
neurodegenerative stratification algorithm that is currently being developed
and tested in clinic.
( 2
min )
Recent work has shown that forward- and reverse- mode automatic
differentiation (AD) over the reals is almost always correct in a
mathematically precise sense. However, actual programs work with
machine-representable numbers (e.g., floating-point numbers), not reals. In
this paper, we study the correctness of AD when the parameter space of a neural
network consists solely of machine-representable numbers. In particular, we
analyze two sets of parameters on which AD can be incorrect: the incorrect set
on which the network is differentiable but AD does not compute its derivative,
and the non-differentiable set on which the network is non-differentiable. For
a neural network with bias parameters, we first prove that the incorrect set is
always empty. We then prove a tight bound on the size of the non-differentiable
set, which is linear in the number of non-differentiabilities in activation
functions, and give a simple necessary and sufficient condition for a parameter
to be in this set. We further prove that AD always computes a Clarke
subderivative even on the non-differentiable set. We also extend these results
to neural networks possibly without bias parameters.
( 2
min )
Previous pitch-controllable text-to-speech (TTS) models rely on directly
modeling fundamental frequency, leading to low variance in synthesized speech.
To address this issue, we propose PITS, an end-to-end pitch-controllable TTS
model that utilizes variational inference to model pitch. Based on VITS, PITS
incorporates the Yingram encoder, the Yingram decoder, and adversarial training
of pitch-shifted synthesis to achieve pitch-controllability. Experiments
demonstrate that PITS generates high-quality speech that is indistinguishable
from ground truth speech and has high pitch-controllability without quality
degradation. Code, audio samples, and demo are available at
https://github.com/anonymous-pits/pits.
( 2
min )
The mechanism of existing style transfer algorithms is by minimizing a hybrid
loss function to push the generated image toward high similarities in both
content and style. However, this type of approach cannot guarantee visual
fidelity, i.e., the generated artworks should be indistinguishable from real
ones. In this paper, we devise a new style transfer framework called QuantArt
for high visual-fidelity stylization. QuantArt pushes the latent representation
of the generated artwork toward the centroids of the real artwork distribution
with vector quantization. By fusing the quantized and continuous latent
representations, QuantArt allows flexible control over the generated artworks
in terms of content preservation, style similarity, and visual fidelity.
Experiments on various style transfer settings show that our QuantArt framework
achieves significantly higher visual fidelity compared with the existing style
transfer methods.
( 2
min )
Recent developments in Deep Learning (DL) suggest a vast potential for
Topology Optimization (TO). However, while there are some promising attempts,
the subfield still lacks a firm footing regarding basic methods and datasets.
We aim to address both points. First, we explore physics-based preprocessing
and equivariant networks to create sample-efficient components for TO DL
pipelines. We evaluate them in a large-scale ablation study using end-to-end
supervised training. The results demonstrate a drastic improvement in sample
efficiency and the predictions' physical correctness. Second, to improve
comparability and future progress, we publish the two first TO datasets
containing problems and corresponding ground truth solutions.
( 2
min )
Fault diagnosis is a crucial area of research in industry. Industrial
processes exhibit diverse operating conditions, where data often have
non-Gaussian, multi-mode, and center-drift characteristics. Data-driven
approaches are currently the main focus in the field, but continuous fault
classification and parameter updates of fault classifiers pose challenges for
multiple operating modes and real-time settings. Thus, a pressing issue is to
achieve real-time multi-mode fault diagnosis in industrial systems. In this
paper, a novel approach to achieve real-time multi-mode fault diagnosis is
proposed for industrial applications, which addresses this critical research
problem. Our approach uses an extended evidence reasoning (ER) algorithm to
fuse information and merge outputs from different base classifiers. These base
classifiers based on broad learning system (BLS) are trained to ensure maximum
fault diagnosis accuracy. Furthermore, pseudo-label learning is used to update
model parameters in real-time. The effectiveness of the proposed approach is
demonstrated on the multi-mode Tennessee Eastman process dataset.
( 2
min )
We introduce a new methodology dubbed ``safe peeling'' to accelerate the
resolution of L0-regularized least-squares problems via a Branch-and-Bound
(BnB) algorithm. Our procedure enables to tighten the convex relaxation
considered at each node of the BnB decision tree and therefore potentially
allows for more aggressive pruning. Numerical simulations show that our
proposed methodology leads to significant gains in terms of number of nodes
explored and overall solving time.s show that our proposed methodology leads to
significant gains in terms of number of nodes explored and overall solving
time.
( 2
min )
Multivariate probabilistic time series forecasts are commonly evaluated via
proper scoring rules, i.e., functions that are minimal in expectation for the
ground-truth distribution. However, this property is not sufficient to
guarantee good discrimination in the non-asymptotic regime. In this paper, we
provide the first systematic finite-sample study of proper scoring rules for
time-series forecasting evaluation. Through a power analysis, we identify the
"region of reliability" of a scoring rule, i.e., the set of practical
conditions where it can be relied on to identify forecasting errors. We carry
out our analysis on a comprehensive synthetic benchmark, specifically designed
to test several key discrepancies between ground-truth and forecast
distributions, and we gauge the generalizability of our findings to real-world
tasks with an application to an electricity production problem. Our results
reveal critical shortcomings in the evaluation of multivariate probabilistic
forecasts as commonly performed in the literature.
( 2
min )
Natural language generation (NLG) is one of the most impactful fields in NLP,
and recent years have witnessed its evolution brought about by large language
models (LLMs). As the key instrument for writing assistance applications, they
are generally prone to replicating or extending offensive content provided in
the input. In low-resource data regime, they can also lead to repetitive
outputs. Usually, offensive content and repetitions are mitigated with post-hoc
methods, including n-gram level blocklists, top-k and nucleus sampling. In this
paper, we apply non-exact repetition suppression using token and sequence level
unlikelihood loss, and further explore the framework of unlikelihood training
objective in order to jointly endow the model with abilities to avoid
generating offensive words and phrases from the beginning. Finally, with
comprehensive experiments, we demonstrate that our proposed methods work
exceptionally in controlling the repetition and content quality of LLM outputs.
( 2
min )
We use a binary attribute representation (BAR) model to describe a data set
of Netflix viewers' ratings of movies. We classify the viewers with discrete
bits rather than continuous parameters, which makes the representation compact
and transparent. The attributes are easy to interpret, and we need far fewer
attributes than similar methods do to achieve the same level of error. We also
take advantage of the nonuniform distribution of ratings among the movies in
the data set to train on a small selection of movies without compromising
performance on the rest of the movies.
( 2
min )
Bilevel optimization has recently regained interest owing to its applications
in emerging machine learning fields such as hyperparameter optimization,
meta-learning, and reinforcement learning. Recent results have shown that
simple alternating (implicit) gradient-based algorithms can achieve the same
convergence rate of single-level gradient descent (GD) for bilevel problems
with a strongly convex lower-level objective. However, it remains unclear
whether this result can be generalized to bilevel problems beyond this basic
setting. In this paper, we propose a Generalized ALternating mEthod for bilevel
opTimization (GALET) with a nonconvex lower-level objective that satisfies the
Polyak-{\L}ojasiewicz (PL) condition. We first introduce a stationary metric
for the considered bilevel problems, which generalizes the existing metric. We
then establish that GALET achieves an $\epsilon$-stationary metric for the
considered problem within $\tilde{\cal O}(\epsilon^{-1})$ iterations, which
matches the iteration complexity of GD for smooth nonconvex problems.
( 2
min )
We present a novel framework for conditional sampling of probability
measures, using block triangular transport maps. We develop the theoretical
foundations of block triangular transport in a Banach space setting,
establishing general conditions under which conditional sampling can be
achieved and drawing connections between monotone block triangular maps and
optimal transport. Based on this theory, we then introduce a computational
approach, called monotone generative adversarial networks (M-GANs), to learn
suitable block triangular maps. Our algorithm uses only samples from the
underlying joint probability measure and is hence likelihood-free. Numerical
experiments with M-GAN demonstrate accurate sampling of conditional measures in
synthetic examples, Bayesian inverse problems involving ordinary and partial
differential equations, and probabilistic image in-painting.
( 2
min )
We propose causal isotonic calibration, a novel nonparametric method for
calibrating predictors of heterogeneous treatment effects. Furthermore, we
introduce cross-calibration, a data-efficient variant of calibration that
eliminates the need for hold-out calibration sets. Cross-calibration leverages
cross-fitted predictors and generates a single calibrated predictor using all
available data. Under weak conditions that do not assume monotonicity, we
establish that both causal isotonic calibration and cross-calibration achieve
fast doubly-robust calibration rates, as long as either the propensity score or
outcome regression is estimated accurately in a suitable sense. The proposed
causal isotonic calibrator can be wrapped around any black-box learning
algorithm, providing robust and distribution-free calibration guarantees while
preserving predictive performance.
( 2
min )
While black-box variational inference is widely used, there is no proof that
its stochastic optimization succeeds. We suggest this is due to a theoretical
gap in existing stochastic optimization proofs-namely the challenge of gradient
estimators with unusual noise bounds, and a composite non-smooth objective. For
dense Gaussian variational families, we observe that existing gradient
estimators based on reparameterization satisfy a quadratic noise bound and give
novel convergence guarantees for proximal and projected stochastic gradient
descent using this bound. This provides the first rigorous guarantee that
black-box variational inference converges for realistic inference problems.
( 2
min )
Personalized treatment effect estimates are often of interest in high-stakes
applications -- thus, before deploying a model estimating such effects in
practice, one needs to be sure that the best candidate from the ever-growing
machine learning toolbox for this task was chosen. Unfortunately, due to the
absence of counterfactual information in practice, it is usually not possible
to rely on standard validation metrics for doing so, leading to a well-known
model selection dilemma in the treatment effect estimation literature. While
some solutions have recently been investigated, systematic understanding of the
strengths and weaknesses of different model selection criteria is still
lacking. In this paper, instead of attempting to declare a global `winner', we
therefore empirically investigate success- and failure modes of different
selection criteria. We highlight that there is a complex interplay between
selection strategies, candidate estimators and the data used for comparing
them, and provide interesting insights into the relative (dis)advantages of
different criteria alongside desiderata for the design of further illuminating
empirical studies in this context.
( 2
min )
The Brazilian social justice reporter is a fellow at the MIT Center for International Studies.
( 11
min )
Announcements The Missing Part in LLMs and GPT-like Systems These days, all the AI talk is about GPT (Generative Pre-Trained Transformer), LLMs (Large Language Models), generative AI, prompt engineering, and related technologies. You must live alone on a small island if you have never heard these words. LLM originated from NLP (natural language processing) which… Read More »DSC Weekly 6 June 2023 – The Missing Part in LLMs and GPT-like Systems
The post DSC Weekly 6 June 2023 – The Missing Part in LLMs and GPT-like Systems appeared first on Data Science Central.
( 21
min )
PyTorch is a machine learning (ML) framework that is widely used by AWS customers for a variety of applications, such as computer vision, natural language processing, content creation, and more. With the recent PyTorch 2.0 release, AWS customers can now do same things as they could with PyTorch 1.x but faster and at scale with […]
( 15
min )
Amazon Transcribe is a speech recognition service that generates transcripts from video and audio files in multiple supported languages and accents. It comes with a rich set of features, including automatic language identification, multi-channel and multi-speaker support, custom vocabularies, and transcript redaction. Amazon Transcribe supports two modes of operation: batch and streaming. In batch mode, […]
( 7
min )
Amazon SageMaker Feature Store is a purpose-built service to store and retrieve feature data for use by machine learning (ML) models. Feature Store provides an online store capable of low-latency, high-throughput reads and writes, and an offline store that provides bulk access to all historical record data. Feature Store handles the synchronization of data between […]
( 11
min )
Dear AI Innovators,
( 6
min )
Posted by Ruofei Du, Research Scientist, and Alex Olwal, Senior Staff Research Scientist, Google Augmented Reality
Recent advances in video conferencing have significantly improved remote video communication through features like live captioning and noise cancellation. However, there are various situations where dynamic visual augmentation would be useful to better convey complex and nuanced information. For example, when discussing what to order at a Japanese restaurant, your friends could share visuals that would help you feel more confident about ordering the “Sukiyaki”. Or when talking about your recent family trip to San Francisco, you may want to show a photo from your personal album.
In “Visual Captions: Augmenting Verbal Communication With On-the-fly Visuals”, presented at …
( 93
min )
As a marine biology student, Josef Melchner always dreamed of spending his days cruising the oceans to find dolphins, whales and fish — but also “wanted to do something practical, something that would benefit the world,” he said. When it came time to choose a career, he dove head first into aquaculture. He’s now CEO Read article >
( 6
min )
Keerthan Sathya, a senior technical artist specializing in 3D, emerged trium-elephant In the NVIDIA Studio this week with the incredibly detailed, expertly constructed, jaw-droppingly beautiful animation Tiny Mammoth.
( 7
min )
Artificial intelligence has emerged as a powerful technology that can drive substantial transformations in businesses across diverse…
( 11
min )
Artificial Intelligence (AI) has emerged as a transformative technology across various industries, and banking is no exception. In recent…
( 10
min )
Web scraping is a technique used to extract data from websites. It allows us to gather information from web pages and use it for various…
( 22
min )
A new multimodal technique blends major self-supervised learning methods to learn more similarly to humans.
( 9
min )
Data is the foundation for machine learning (ML) algorithms. One of the most common formats for storing large amounts of data is Apache Parquet due to its compact and highly efficient format. This means that business analysts who want to extract insights from the large volumes of data in their data warehouse must frequently use […]
( 8
min )
Amazon SageMaker Automatic Model Tuning has introduced Autotune, a new feature to automatically choose hyperparameters on your behalf. This provides an accelerated and more efficient way to find hyperparameter ranges, and can provide significant optimized budget and time management for your automatic model tuning jobs. In this post, we discuss this new capability and some […]
( 8
min )
This post is co-written with Philipp Schmid from Hugging Face. We have all heard about the progress being made in the field of large language models (LLMs) and the ever-growing number of problem sets where LLMs are providing valuable insights. Large models, when trained over massive datasets and several tasks, are also able to generalize […]
( 13
min )
This post is co-written with Philipp Schmid and Jeff Boudier from Hugging Face. Today, as part of Amazon Web Services’ partnership with Hugging Face, we are excited to announce the release of a new Hugging Face Deep Learning Container (DLC) for inference with Large Language Models (LLMs). This new Hugging Face LLM DLC is powered […]
( 7
min )
Jiusheng Chen’s team just got accelerated. They’re delivering personalized ads to users of Microsoft Bing with 7x throughput at reduced cost, thanks to NVIDIA Triton Inference Server running on NVIDIA A100 Tensor Core GPUs. It’s an amazing achievement for the principal software engineering manager and his crew. Tuning a Complex System Bing’s ad service uses Read article >
( 4
min )
Maria Girone is expanding the world’s largest network of scientific computers with accelerated computing and AI.
( 6
min )
Ambulatory surgery centers face unique financial challenges in the fast-paced healthcare industry. With AI, ASCs can unlock untapped revenue potential. AI revolutionizes revenue cycles, optimizes billing processes, and drives significant financial growth in ASCs. Healthcare is slower to adopt new technologies than manufacturing and retail. In our blog “Must Have Medical Practice Technologies to Boost… Read More »AI As A Catalyst For Financial Success In ASCs: Unlocking Revenue Potential
The post AI As A Catalyst For Financial Success In ASCs: Unlocking Revenue Potential appeared first on Data Science Central.
( 21
min )
Navigation is a complex skill with a long history of research in animals and
humans. In this work, we simulate the Morris Water Maze in 2D to train deep
reinforcement learning agents. We perform automatic classification of
navigation strategies, analyze the distribution of strategies used by
artificial agents, and compare them with experimental data to show similar
learning dynamics as those seen in humans and rodents. We develop
environment-specific auxiliary tasks and examine factors affecting their
usefulness. We suggest that the most beneficial tasks are potentially more
biologically feasible for real agents to use. Lastly, we explore the
development of internal representations in the activations of artificial agent
neural networks. These representations resemble place cells and head-direction
cells found in mouse brains, and their presence has correlation to the
navigation strategies that artificial agents employ.
( 2
min )
Generative AI models have recently achieved astonishing results in quality
and are consequently employed in a fast-growing number of applications.
However, since they are highly data-driven, relying on billion-sized datasets
randomly scraped from the internet, they also suffer from degenerated and
biased human behavior, as we demonstrate. In fact, they may even reinforce such
biases. To not only uncover but also combat these undesired effects, we present
a novel strategy, called Fair Diffusion, to attenuate biases after the
deployment of generative text-to-image models. Specifically, we demonstrate
shifting a bias, based on human instructions, in any direction yielding
arbitrarily new proportions for, e.g., identity groups. As our empirical
evaluation demonstrates, this introduced control enables instructing generative
image models on fairness, with no data filtering and additional training
required.
( 2
min )
We consider deep neural networks with a Lipschitz continuous activation
function and with weight matrices of variable widths. We establish a uniform
convergence analysis framework in which sufficient conditions on weight
matrices and bias vectors together with the Lipschitz constant are provided to
ensure uniform convergence of the deep neural networks to a meaningful function
as the number of their layers tends to infinity. In the framework, special
results on uniform convergence of deep neural networks with a fixed width,
bounded widths and unbounded widths are presented. In particular, as
convolutional neural networks are special deep neural networks with weight
matrices of increasing widths, we put forward conditions on the mask sequence
which lead to uniform convergence of resulting convolutional neural networks.
The Lipschitz continuity assumption on the activation functions allows us to
include in our theory most of commonly used activation functions in
applications.
( 2
min )
Matching algorithms are commonly used to predict matches between items in a
collection. For example, in 1:1 face verification, a matching algorithm
predicts whether two face images depict the same person. Accurately assessing
the uncertainty of the error rates of such algorithms can be challenging when
data are dependent and error rates are low, two aspects that have been often
overlooked in the literature. In this work, we review methods for constructing
confidence intervals for error rates in matching tasks such as 1:1 face
verification. We derive and examine the statistical properties of these methods
and demonstrate how coverage and interval width vary with sample size, error
rates, and degree of data dependence using both synthetic and real-world
datasets. Based on our findings, we provide recommendations for best practices
for constructing confidence intervals for error rates in matching tasks.
( 2
min )
Neural networks are powerful functions with widespread use, but the
theoretical behaviour of these functions is not fully understood. Creating deep
neural networks by stacking many layers has achieved exceptional performance in
many applications and contributed to the recent explosion of these methods.
Previous works have shown that depth can exponentially increase the
expressibility of the network. However, as networks get deeper and deeper, they
are more susceptible to becoming degenerate. We observe this degeneracy in the
sense that on initialization, inputs tend to become more and more correlated as
they travel through the layers of the network. If a network has too many
layers, it tends to approximate a (random) constant function, making it
effectively incapable of distinguishing between inputs. This seems to affect
the training of the network and cause it to perform poorly, as we empirically
investigate in this paper. We use a simple algorithm that can accurately
predict the level of degeneracy for any given fully connected ReLU network
architecture, and demonstrate how the predicted degeneracy relates to training
dynamics of the network. We also compare this prediction to predictions derived
using infinite width networks.
( 2
min )
Recent work on mini-batch consistency (MBC) for set functions has brought
attention to the need for sequentially processing and aggregating chunks of a
partitioned set while guaranteeing the same output for all partitions. However,
existing constraints on MBC architectures lead to models with limited
expressive power. Additionally, prior work has not addressed how to deal with
large sets during training when the full set gradient is required. To address
these issues, we propose a Universally MBC (UMBC) class of set functions which
can be used in conjunction with arbitrary non-MBC components while still
satisfying MBC, enabling a wider range of function classes to be used in MBC
settings. Furthermore, we propose an efficient MBC training algorithm which
gives an unbiased approximation of the full set gradient and has a constant
memory overhead for any set size for both train- and test-time. We conduct
extensive experiments including image completion, text classification,
unsupervised clustering, and cancer detection on high-resolution images to
verify the efficiency and efficacy of our scalable set encoding framework.
( 2
min )
A prediscretisation of numerical attributes which is required by some rule
learning algorithms is a source of inefficiencies. This paper describes new
rule tuning steps that aim to recover lost information in the discretisation
and new pruning techniques that may further reduce the size of rule models and
improve their accuracy. The proposed QCBA method was initially developed to
postprocess quantitative attributes in models generated by the Classification
based on associations (CBA) algorithm, but it can also be applied to the
results of other rule learning approaches. We demonstrate the effectiveness on
the postprocessing of models generated by five association rule classification
algorithms (CBA, CMAR, CPAR, IDS, SBRL) and two first-order logic rule learners
(FOIL2 and PRM). Benchmarks on 22 datasets from the UCI repository show smaller
size and the overall best predictive performance for FOIL2+QCBA compared to all
seven baselines. Postoptimised CBA models have a better predictive performance
compared to the state-of-the-art rule learner CORELS in this benchmark. The
article contains an ablation study for the individual postprocessing steps and
a scalability analysis on the KDD'99 Anomaly detection dataset.
( 2
min )
Machine Learning (ML) algorithms are vulnerable to poisoning attacks, where a
fraction of the training data is manipulated to deliberately degrade the
algorithms' performance. Optimal attacks can be formulated as bilevel
optimization problems and help to assess their robustness in worst-case
scenarios. We show that current approaches, which typically assume that
hyperparameters remain constant, lead to an overly pessimistic view of the
algorithms' robustness and of the impact of regularization. We propose a novel
optimal attack formulation that considers the effect of the attack on the
hyperparameters and models the attack as a multiobjective bilevel optimization
problem. This allows to formulate optimal attacks, learn hyperparameters and
evaluate robustness under worst-case conditions. We apply this attack
formulation to several ML classifiers using $L_2$ and $L_1$ regularization. Our
evaluation on multiple datasets confirms the limitations of previous strategies
and evidences the benefits of using $L_2$ and $L_1$ regularization to dampen
the effect of poisoning attacks.
( 2
min )
The relational data model was designed to facilitate large-scale data
management and analytics. We consider the problem of how to differentiate
computations expressed relationally. We show experimentally that a relational
engine running an auto-differentiated relational algorithm can easily scale to
very large datasets, and is competitive with state-of-the-art, special-purpose
systems for large-scale distributed machine learning.
( 2
min )
We introduce an efficient and robust auto-tuning framework for hyperparameter
selection in dimension reduction (DR) algorithms, focusing on large-scale
datasets and arbitrary performance metrics. By leveraging Bayesian optimization
(BO) with a surrogate model, our approach enables efficient hyperparameter
selection with multi-objective trade-offs and allows us to perform data-driven
sensitivity analysis. By incorporating normalization and subsampling, the
proposed framework demonstrates versatility and efficiency, as shown in
applications to visualization techniques such as t-SNE and UMAP. We evaluate
our results on various synthetic and real-world datasets using multiple quality
metrics, providing a robust and efficient solution for hyperparameter selection
in DR algorithms.
( 2
min )
Scaling methods have long been utilized to simplify and cluster
high-dimensional data. However, the general latent spaces across all predefined
groups derived from these methods sometimes do not fall into researchers'
interest regarding specific patterns within groups. To tackle this issue, we
adopt an emerging analysis approach called contrastive learning. We contribute
to this growing field by extending its ideas to multiple correspondence
analysis (MCA) in order to enable an analysis of data often encountered by
social scientists -- containing binary, ordinal, and nominal variables. We
demonstrate the utility of contrastive MCA (cMCA) by analyzing two different
surveys of voters in the U.S. and U.K. Our results suggest that, first, cMCA
can identify substantively important dimensions and divisions among subgroups
that are overlooked by traditional methods; second, for other cases, cMCA can
derive latent traits that emphasize subgroups seen moderately in those derived
by traditional methods.
( 2
min )
The strength of modern generative models lies in their ability to be
controlled through text-based prompts. Typical "hard" prompts are made from
interpretable words and tokens, and must be hand-crafted by humans. There are
also "soft" prompts, which consist of continuous feature vectors. These can be
discovered using powerful optimization methods, but they cannot be easily
interpreted, re-used across models, or plugged into a text-based interface.
We describe an approach to robustly optimize hard text prompts through
efficient gradient-based optimization. Our approach automatically generates
hard text-based prompts for both text-to-image and text-to-text applications.
In the text-to-image setting, the method creates hard prompts for diffusion
models, allowing API users to easily generate, discover, and mix and match
image concepts without prior knowledge on how to prompt the model. In the
text-to-text setting, we show that hard prompts can be automatically discovered
that are effective in tuning LMs for classification.
( 2
min )
We propose a novel method to optimize the structure of factor graphs for
graph-based inference. As an example inference task, we consider symbol
detection on linear inter-symbol interference channels. The factor graph
framework has the potential to yield low-complexity symbol detectors. However,
the sum-product algorithm on cyclic factor graphs is suboptimal and its
performance is highly sensitive to the underlying graph. Therefore, we optimize
the structure of the underlying factor graphs in an end-to-end manner using
machine learning. For that purpose, we transform the structural optimization
into a clustering problem of low-degree factor nodes that incorporates the
known channel model into the optimization. Furthermore, we study the
combination of this approach with neural belief propagation, yielding
near-maximum a posteriori symbol detection performance for specific channels.
( 2
min )
Recent advances to combine structured regression models and deep neural
networks for better interpretability, more expressiveness, and statistically
valid uncertainty quantification demonstrate the versatility of semi-structured
neural networks (SSNs). We show that techniques to properly identify the
contributions of the different model components in SSNs, however, lead to
suboptimal network estimation, slower convergence, and degenerated or erroneous
predictions. In order to solve these problems while preserving favorable model
properties, we propose a non-invasive post-hoc orthogonalization (PHO) that
guarantees identifiability of model components and provides better estimation
and prediction quality. Our theoretical findings are supported by numerical
experiments, a benchmark comparison as well as a real-world application to
COVID-19 infections.
( 2
min )
We propose an efficient algorithm for matching two correlated
Erd\H{o}s--R\'enyi graphs with $n$ vertices whose edges are correlated through
a latent vertex correspondence. When the edge density $q= n^{- \alpha+o(1)}$
for a constant $\alpha \in [0,1)$, we show that our algorithm has polynomial
running time and succeeds to recover the latent matching as long as the edge
correlation is non-vanishing. This is closely related to our previous work on a
polynomial-time algorithm that matches two Gaussian Wigner matrices with
non-vanishing correlation, and provides the first polynomial-time random graph
matching algorithm (regardless of the regime of $q$) when the edge correlation
is below the square root of the Otter's constant (which is $\approx 0.338$).
( 2
min )
.NET and Java, both languages, are widely used in the field of development. Both are used in businesses for creating web pages, and websites. If you want to work on both server-side and desktop applications then, these two languages allow you to work on both. It seems to be very challenging in selecting one language… Read More ».NET Full Stack Web Development Vs. Java Full Stack Web Development – Which is Better?
The post .NET Full Stack Web Development Vs. Java Full Stack Web Development – Which is Better? appeared first on Data Science Central.
( 22
min )
The season of hot sun and longer days is here, so stay inside this summer with 20 games joining GeForce NOW in June. Or stream across devices by the pool, from grandma’s house or in the car — whichever way, GeForce NOW has you covered. Titles from the Age of Empires series are the next Read article >
( 6
min )
Neuralangelo, a new AI model by NVIDIA Research for 3D reconstruction using neural networks, turns 2D video clips into detailed 3D structures — generating lifelike virtual replicas of buildings, sculptures and other real-world objects. Like Michelangelo sculpting stunning, life-like visions from blocks of marble, Neuralangelo generates 3D structures with intricate details and textures. Creative professionals Read article >
( 5
min )
Pretraining a neural network on a large dataset is becoming a cornerstone in
machine learning that is within the reach of only a few communities with
large-resources. We aim at an ambitious goal of democratizing pretraining.
Towards that goal, we train and release a single neural network that can
predict high quality ImageNet parameters of other neural networks. By using
predicted parameters for initialization we are able to boost training of
diverse ImageNet models available in PyTorch. When transferred to other
datasets, models initialized with predicted parameters also converge faster and
reach competitive final performance.
( 2
min )
In real world domains, most graphs naturally exhibit a hierarchical
structure. However, data-driven graph generation is yet to effectively capture
such structures. To address this, we propose a novel approach that recursively
generates community structures at multiple resolutions, with the generated
structures conforming to training data distribution at each level of the
hierarchy. The graphs generation is designed as a sequence of coarse-to-fine
generative models allowing for parallel generation of all sub-structures,
resulting in a high degree of scalability. Our method demonstrates generative
performance improvement on multiple graph datasets.
( 2
min )
In diagnosing challenging conditions such as Alzheimer's disease (AD),
imaging is an important reference. Non-imaging patient data such as patient
information, genetic data, medication information, cognitive and memory tests
also play a very important role in diagnosis. Effect. However, limited by the
ability of artificial intelligence models to mine such information, most of the
existing models only use multi-modal image data, and cannot make full use of
non-image data. We use a currently very popular pre-trained large language
model (LLM) to enhance the model's ability to utilize non-image data, and
achieved SOTA results on the ADNI dataset.
( 2
min )
Hyperparameter optimization (HPO) is a vital step in improving performance in
deep learning (DL). Practitioners are often faced with the trade-off between
multiple criteria, such as accuracy and latency. Given the high computational
needs of DL and the growing demand for efficient HPO, the acceleration of
multi-objective (MO) optimization becomes ever more important. Despite the
significant body of work on meta-learning for HPO, existing methods are
inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet
powerful MO-HPO algorithm. In this paper, we extend TPE's acquisition function
to the meta-learning setting using a task similarity defined by the overlap of
top domains between tasks. We also theoretically analyze and address the
limitations of our task similarity. In the experiments, we demonstrate that our
method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art
performance. Our method was also validated externally by winning the AutoML
2022 competition on "Multiobjective Hyperparameter Optimization for
Transformers".
( 2
min )
Global optimization of decision trees has shown to be promising in terms of
accuracy, size, and consequently human comprehensibility. However, many of the
methods used rely on general-purpose solvers for which scalability remains an
issue. Dynamic programming methods have been shown to scale much better because
they exploit the tree structure by solving subtrees as independent subproblems.
However, this only works when an objective can be optimized separately for
subtrees. We explore this relationship in detail and show necessary and
sufficient conditions for such separability and generalize previous dynamic
programming approaches into a framework that can optimize any combination of
separable objectives and constraints. Experiments on four application domains
show the general applicability of this framework, while outperforming the
scalability of general-purpose solvers by a large margin.
( 2
min )
We study single-machine scheduling of jobs, each belonging to a job type that
determines its duration distribution. We start by analyzing the scenario where
the type characteristics are known and then move to two learning scenarios
where the types are unknown: non-preemptive problems, where each started job
must be completed before moving to another job; and preemptive problems, where
job execution can be paused in the favor of moving to a different job. In both
cases, we design algorithms that achieve sublinear excess cost, compared to the
performance with known types, and prove lower bounds for the non-preemptive
case. Notably, we demonstrate, both theoretically and through simulations, how
preemptive algorithms can greatly outperform non-preemptive ones when the
durations of different job types are far from one another, a phenomenon that
does not occur when the type durations are known.
( 2
min )
Physics-Informed Neural Networks (PINNs) have become a prominent application
of deep learning in scientific computation, as they are powerful approximators
of solutions to nonlinear partial differential equations (PDEs). There have
been numerous attempts to facilitate the training process of PINNs by adjusting
the weight of each component of the loss function, called adaptive
loss-balancing algorithms. In this paper, we propose an Augmented Lagrangian
relaxation method for PINNs (AL-PINNs). We treat the initial and boundary
conditions as constraints for the optimization problem of the PDE residual. By
employing Augmented Lagrangian relaxation, the constrained optimization problem
becomes a sequential max-min problem so that the learnable parameters $\lambda$
adaptively balance each loss component. Our theoretical analysis reveals that
the sequence of minimizers of the proposed loss functions converges to an
actual solution for the Helmholtz, viscous Burgers, and Klein--Gordon
equations. We demonstrate through various numerical experiments that AL-PINNs
yield a much smaller relative error compared with that of state-of-the-art
adaptive loss-balancing algorithms.
( 2
min )
A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random
variables (the vertices); a Bayesian Network Distribution (BND) is a
probability distribution on the random variables that is Markovian on the
graph. A finite $k$-mixture of such models is graphically represented by a
larger graph which has an additional ``hidden'' (or ``latent'') random variable
$U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other
vertex. Models of this type are fundamental to causal inference, where $U$
models an unobserved confounding effect of multiple populations, obscuring the
causal relationships in the observable DAG. By solving the mixture problem and
recovering the joint probability distribution with $U$, traditionally
unidentifiable causal relationships become identifiable. Using a reduction to
the more well-studied ``product'' case on empty graphs, we give the first
algorithm to learn mixtures of non-empty DAGs.
( 2
min )
The manifold hypothesis, which assumes that data lies on or close to an
unknown manifold of low intrinsic dimension, is a staple of modern machine
learning research. However, recent work has shown that real-world data exhibits
distinct non-manifold structures, i.e. singularities, that can lead to
erroneous findings. Detecting such singularities is therefore crucial as a
precursor to interpolation and inference tasks. We address this issue by
developing a topological framework that (i) quantifies the local intrinsic
dimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness'
of a point along multiple scales. Our approach identifies singularities of
complex spaces, while also capturing singular structures and local geometric
complexity in image data.
( 2
min )
Graph Neural Networks (GNNs) had been demonstrated to be inherently
susceptible to the problems of over-smoothing and over-squashing. These issues
prohibit the ability of GNNs to model complex graph interactions by limiting
their effectiveness in taking into account distant information. Our study
reveals the key connection between the local graph geometry and the occurrence
of both of these issues, thereby providing a unified framework for studying
them at a local scale using the Ollivier-Ricci curvature. Specifically, we
demonstrate that over-smoothing is linked to positive graph curvature while
over-squashing is linked to negative graph curvature. Based on our theory, we
propose the Batch Ollivier-Ricci Flow, a novel rewiring algorithm capable of
simultaneously addressing both over-smoothing and over-squashing.
( 2
min )
We study the loss landscape of two-layer mildly overparameterized ReLU neural
networks on a generic finite input dataset for the squared error loss. Our
approach involves bounding the dimension of the sets of local and global minima
using the rank of the Jacobian of the parameterization map. Using results on
random binary matrices, we show most activation patterns correspond to
parameter regions with no bad differentiable local minima. Furthermore, for
one-dimensional input data, we show most activation regions realizable by the
network contain a high dimensional set of global minima and no bad local
minima. We experimentally confirm these results by finding a phase transition
from most regions having full rank to many regions having deficient rank
depending on the amount of overparameterization.
( 2
min )
A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random
variables (the vertices); a Bayesian Network Distribution (BND) is a
probability distribution on the random variables that is Markovian on the
graph. A finite $k$-mixture of such models is graphically represented by a
larger graph which has an additional ``hidden'' (or ``latent'') random variable
$U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other
vertex. Models of this type are fundamental to causal inference, where $U$
models an unobserved confounding effect of multiple populations, obscuring the
causal relationships in the observable DAG. By solving the mixture problem and
recovering the joint probability distribution with $U$, traditionally
unidentifiable causal relationships become identifiable. Using a reduction to
the more well-studied ``product'' case on empty graphs, we give the first
algorithm to learn mixtures of non-empty DAGs.
( 2
min )
Neuro-Symbolic (NeSy) predictive models hold the promise of improved
compliance with given constraints, systematic generalization, and
interpretability, as they allow to infer labels that are consistent with some
prior knowledge by reasoning over high-level concepts extracted from
sub-symbolic inputs. It was recently shown that NeSy predictors are affected by
reasoning shortcuts: they can attain high accuracy but by leveraging concepts
with unintended semantics, thus coming short of their promised advantages. Yet,
a systematic characterization of reasoning shortcuts and of potential
mitigation strategies is missing. This work fills this gap by characterizing
them as unintended optima of the learning objective and identifying four key
conditions behind their occurrence. Based on this, we derive several natural
mitigation strategies, and analyze their efficacy both theoretically and
empirically. Our analysis shows reasoning shortcuts are difficult to deal with,
casting doubts on the trustworthiness and interpretability of existing NeSy
solutions.
( 2
min )
The manifold hypothesis, which assumes that data lies on or close to an
unknown manifold of low intrinsic dimension, is a staple of modern machine
learning research. However, recent work has shown that real-world data exhibits
distinct non-manifold structures, i.e. singularities, that can lead to
erroneous findings. Detecting such singularities is therefore crucial as a
precursor to interpolation and inference tasks. We address this issue by
developing a topological framework that (i) quantifies the local intrinsic
dimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness'
of a point along multiple scales. Our approach identifies singularities of
complex spaces, while also capturing singular structures and local geometric
complexity in image data.
( 2
min )
We propose an efficient online approximate Bayesian inference algorithm for
estimating the parameters of a nonlinear function from a potentially
non-stationary data stream. The method is based on the extended Kalman filter
(EKF), but uses a novel low-rank plus diagonal decomposition of the posterior
precision matrix, which gives a cost per step which is linear in the number of
model parameters. In contrast to methods based on stochastic variational
inference, our method is fully deterministic, and does not require step-size
tuning. We show experimentally that this results in much faster (more sample
efficient) learning, which results in more rapid adaptation to changing
distributions, and faster accumulation of reward when used as part of a
contextual bandit algorithm.
( 2
min )
The convergence of deterministic policy gradient under the Hadamard
parametrization is studied in the tabular setting and the global linear
convergence of the algorithm is established. To this end, we first show that
the error decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based
on this result, we further show that the algorithm has a faster local linear
convergence rate after $k_0$ iterations, where $k_0$ is a constant that only
depends on the MDP problem and the step size. Overall, the algorithm displays a
linear convergence rate for all the iterations with a loose constant than that
for the local linear convergence rate.
( 2
min )
Pairwise learning refers to learning tasks where a loss takes a pair of
samples into consideration. In this paper, we study pairwise learning with deep
ReLU networks and estimate the excess generalization error. For a general loss
satisfying some mild conditions, a sharp bound for the estimation error of
order $O((V\log(n) /n)^{1/(2-\beta)})$ is established. In particular, with the
pairwise least squares loss, we derive a nearly optimal bound of the excess
generalization error which achieves the minimax lower bound up to a logrithmic
term when the true predictor satisfies some smoothness regularities.
( 2
min )
Second order stochastic optimizers allow parameter update step size and
direction to adapt to loss curvature, but have traditionally required too much
memory and compute for deep learning. Recently, Shampoo [Gupta et al., 2018]
introduced a Kronecker factored preconditioner to reduce these requirements: it
is used for large deep models [Anil et al., 2020] and in production [Anil et
al., 2022]. However, it takes inverse matrix roots of ill-conditioned matrices.
This requires 64-bit precision, imposing strong hardware constraints. In this
paper, we propose a novel factorization, Kronecker Approximation-Domination
(KrAD). Using KrAD, we update a matrix that directly approximates the inverse
empirical Fisher matrix (like full matrix AdaGrad), avoiding inversion and
hence 64-bit precision. We then propose KrADagrad$^\star$, with similar
computational costs to Shampoo and the same regret. Synthetic ill-conditioned
experiments show improved performance over Shampoo for 32-bit precision, while
for several real datasets we have comparable or better generalization.
( 2
min )
We address the problem of biased gradient estimation in deep Boltzmann
machines (DBMs). The existing method to obtain an unbiased estimator uses a
maximal coupling based on a Gibbs sampler, but when the state is
high-dimensional, it takes a long time to converge. In this study, we propose
to use a coupling based on the Metropolis-Hastings (MH) and to initialize the
state around a local mode of the target distribution. Because of the propensity
of MH to reject proposals, the coupling tends to converge in only one step with
a high probability, leading to high efficiency. We find that our method allows
DBMs to be trained in an end-to-end fashion without greedy pretraining. We also
propose some practical techniques to further improve the performance of DBMs.
We empirically demonstrate that our training algorithm enables DBMs to show
comparable generative performance to other deep generative models, achieving
the FID score of 10.33 for MNIST.
( 2
min )
We study the loss landscape of two-layer mildly overparameterized ReLU neural
networks on a generic finite input dataset for the squared error loss. Our
approach involves bounding the dimension of the sets of local and global minima
using the rank of the Jacobian of the parameterization map. Using results on
random binary matrices, we show most activation patterns correspond to
parameter regions with no bad differentiable local minima. Furthermore, for
one-dimensional input data, we show most activation regions realizable by the
network contain a high dimensional set of global minima and no bad local
minima. We experimentally confirm these results by finding a phase transition
from most regions having full rank to many regions having deficient rank
depending on the amount of overparameterization.
( 2
min )
A critical component of business success is the ability to connect with customers. Businesses today want to connect with their customers by offering their content across multiple languages in real time. For most customers, the content creation process is disconnected from the localization effort of translating content into multiple target languages. These disconnected processes delay […]
( 5
min )
Running machine learning (ML) workloads with containers is becoming a common practice. Containers can fully encapsulate not just your training code, but the entire dependency stack down to the hardware libraries and drivers. What you get is an ML development environment that is consistent and portable. With containers, scaling on a cluster becomes much easier. […]
( 9
min )
PyTorch is a machine learning (ML) framework based on the Torch library, used for applications such as computer vision and natural language processing. One of the primary reasons that customers are choosing a PyTorch framework is its simplicity and the fact that it’s designed and assembled to work with Python. PyTorch supports dynamic computational graphs, […]
( 12
min )
The Amazon SageMaker Python SDK is an open-source library for training and deploying machine learning (ML) models on Amazon SageMaker. Enterprise customers in tightly controlled industries such as healthcare and finance set up security guardrails to ensure their data is encrypted and traffic doesn’t traverse the internet. To ensure the SageMaker training and deployment of […]
( 10
min )
Getting AWS Certified can help you propel your career, whether you’re looking to find a new role, showcase your skills to take on a new project, or become your team’s go-to expert. And because AWS Certification exams are created by experts in the relevant role or technical area, preparing for one of these exams helps […]
( 10
min )
Selecting the right method gives users a more accurate picture of how their model is behaving, so they are better equipped to correctly interpret its predictions.
( 9
min )
Researchers develop an algorithm that decides when a “student” machine should follow its teacher, and when it should learn on its own.
( 10
min )
The Internet is a great place to hang out in. And it is also the place where cybercrimes are committed, grow, and evolve. Just like any other crime, cybercriminals also come up with innovative ideas from time to time to do damage to businesses as well as individuals. If we look at the numbers, the… Read More »Top 4 cybersecurity certifications that will get you hired
The post Top 4 cybersecurity certifications that will get you hired appeared first on Data Science Central.
( 20
min )
As technology continues to advance rapidly, the realm of education is not immune to its transformative effects. One area that has seen significant progress is exam evaluation. Traditionally, grading exams has been a time-consuming and subjective process, prone to human error and bias. However, with the emergence of automated grading systems powered by Artificial Intelligence… Read More »Automated Grading Systems: How AI is Revolutionizing Exam Evaluation
The post Automated Grading Systems: How AI is Revolutionizing Exam Evaluation appeared first on Data Science Central.
( 22
min )
We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”). In addition to boosting performance relative to outcome supervision, process supervision also has an important alignment benefit: it directly trains the model to produce a chain-of-thought that is endorsed by humans.
( 4
min )
Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, […]
( 8
min )
In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support plan. Since its introduction, we have helped hundreds of customers optimize their workloads, set guardrails, and improve visibility of their machine learning (ML) workloads’ cost and usage. In this series of posts, we share lessons learned about optimizing costs in […]
( 18
min )
In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support plan. Since its introduction, we’ve helped hundreds of customers optimize their workloads, set guardrails, and improve the visibility of their machine learning (ML) workloads’ cost and usage. In this series of posts, we share lessons learned about optimizing costs in […]
( 8
min )
In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support plan. Since its introduction, we’ve helped hundreds of customers optimize their workloads, set guardrails, and improve the visibility of their machine learning (ML) workloads’ cost and usage. In this series of posts, we share lessons learned about optimizing costs in […]
( 10
min )
In 2021, we launched AWS Support Proactive Services as part of the AWS Enterprise Support offering. Since its introduction, we have helped hundreds of customers optimize their workloads, set guardrails, and improve the visibility of their machine learning (ML) workloads’ cost and usage. In this series of posts, we share lessons learned about optimizing costs […]
( 15
min )
Cost optimization is one of the pillars of the AWS Well-Architected Framework, and it’s a continual process of refinement and improvement over the span of a workload’s lifecycle. It enables building and operating cost-aware systems that minimize costs, maximize return on investment, and achieve business outcomes. Amazon SageMaker is a fully managed machine learning (ML) […]
( 11
min )
Amazon SageMaker Ground Truth Plus helps you prepare high-quality training datasets by removing the undifferentiated heavy lifting associated with building data labeling applications and managing the labeling workforce. All you do is share data along with labeling requirements, and Ground Truth Plus sets up and manages your data labeling workflow based on these requirements. From […]
( 13
min )
Providing healthcare in remote or rural areas is challenging, particularly specialized medicine and surgical procedures. Patients may need to travel long distances just to get to medical facilities and to communicate with caregivers. They may not arrive in time to receive essential information before their medical appointments and may have to return home before they can receive crucial follow-up care at the hospital. Some patients may wait several days just to meet with their surgeon. This is a very different experience from that of urban or suburban residents or people in more developed areas, where patients can get to a nearby clinic or hospital with relative ease.
The post 3D telemedicine brings better care to underserved and rural communities, even across continents appeared first on Microsoft Research.
( 13
min )
Last week in Cambridge was Hinton bonanza. He visited the university town where he was once an undergraduate in experimental psychology, and gave a series of back-to-back talks, Q&A sessions, interviews, dinners, etc. He was stopped on the street by random passers-by who recognised him from the lecture,
( 8
min )
New 14-inch NVIDIA Studio laptops, equipped with GeForce RTX 40 Series Laptop GPUs, give creators peak portability with a significant increase in performance over the last generation.
( 9
min )
MediaTek, a leading innovator in connectivity and multimedia, is teaming with NVIDIA to bring drivers and passengers new experiences inside the car. The partnership was announced today at a COMPUTEX press conference with MediaTek CEO Rick Tsai and NVIDIA founder and CEO Jensen Huang. “NVIDIA is a world-renowned pioneer and industry leader in AI and Read article >
( 6
min )
In his first live keynote since the pandemic, NVIDIA founder and CEO Jensen Huang today kicked off the COMPUTEX conference in Taipei, announcing platforms that companies can use to ride a historic wave of generative AI that’s transforming industries from advertising to manufacturing to telecom. “We’re back,” Huang roared as he took the stage after Read article >
( 10
min )
As mobile robot shipments surge to meet the growing demands of industries seeking operational efficiencies, NVIDIA is launching a new platform to enable the next generation of autonomous mobile robot (AMR) fleets. Isaac AMR brings advanced mapping, autonomy and simulation to mobile robots and will soon be available for early customers, NVIDIA founder and CEO Read article >
( 5
min )
How do you help robots build better robots? By simulating even more robots. NVIDIA founder and CEO Jensen Huang today showcased how leading electronics manufacturer Quanta is using AI-enabled robots to inspect the quality of its products. In his keynote speech at this week’s COMPUTEX trade show in Taipei, Huang presented on how electronics manufacturers Read article >
( 6
min )
The $46 trillion global electronics manufacturing industry spans more than 10 million factories worldwide, where much is at stake in producing defect-free products. To drive product excellence, leading electronics manufacturers are adopting NVIDIA Metropolis for Factories. More than 50 manufacturing giants and industrial automation providers — including Foxconn Industrial Internet, Pegatron, Quanta, Siemens and Wistron Read article >
( 6
min )
Generative AI is rapidly ushering in a new era of computing for productivity, content creation, gaming and more. Generative AI models and applications — like NVIDIA NeMo and DLSS 3 Frame Generation, Meta LLaMa, ChatGPT, Adobe Firefly and Stable Diffusion — use neural networks to identify patterns and structures within existing data to generate new Read article >
( 7
min )
“You are running for food, or you are running from becoming food. And often times, you can’t tell which. Either way, run.” NVIDIA founder and CEO Jensen Huang today urged graduates of National Taiwan University to run hard to seize the unprecedented opportunities that AI will present, but embrace the inevitable failures along the way. Read article >
( 5
min )
The primary goal in recommendation is to suggest relevant content to users,
but optimizing for accuracy often results in recommendations that lack
diversity. To remedy this, conventional approaches such as re-ranking improve
diversity by presenting more diverse items. Here we argue that to promote
inherent and prolonged diversity, the system must encourage its creation.
Towards this, we harness the performative nature of recommendation, and show
how learning can incentivize strategic content creators to create diverse
content. Our approach relies on a novel form of regularization that anticipates
strategic changes to content, and penalizes for content homogeneity. We provide
analytic and empirical results that demonstrate when and how diversity can be
incentivized, and experimentally demonstrate the utility of our approach on
synthetic and semi-synthetic data.
( 2
min )
This paper delves into stochastic optimization problems that involve
Markovian noise. We present a unified approach for the theoretical analysis of
first-order gradient methods for stochastic optimization and variational
inequalities. Our approach covers scenarios for both non-convex and strongly
convex minimization problems. To achieve an optimal (linear) dependence on the
mixing time of the underlying noise sequence, we use the randomized batching
scheme, which is based on the multilevel Monte Carlo method. Moreover, our
technique allows us to eliminate the limiting assumptions of previous research
on Markov noise, such as the need for a bounded domain and uniformly bounded
stochastic gradients. Our extension to variational inequalities under Markovian
noise is original. Additionally, we provide lower bounds that match the oracle
complexity of our method in the case of strongly convex optimization problems.
( 2
min )
The increasing adoption of text-to-speech technologies has led to a growing
demand for natural and emotive voices that adapt to a conversation's context
and emotional tone. The Emotive Narrative Storytelling (EMNS) corpus is a
unique speech dataset created to enhance conversations' expressiveness and
emotive quality in interactive narrative-driven systems. The corpus consists of
a 2.3-hour recording featuring a female speaker delivering labelled utterances.
It encompasses eight acted emotional states, evenly distributed with a variance
of 0.68%, along with expressiveness levels and natural language descriptions
with word emphasis labels. The evaluation of audio samples from different
datasets revealed that the EMNS corpus achieved the highest average scores in
accurately conveying emotions and demonstrating expressiveness. It outperformed
other datasets in conveying shared emotions and achieved comparable levels of
genuineness. A classification task confirmed the accurate representation of
intended emotions in the corpus, with participants recognising the recordings
as genuine and expressive. Additionally, the availability of the dataset
collection tool under the Apache 2.0 License simplifies remote speech data
collection for researchers.
( 2
min )
Energy markets can provide incentives for undesired behavior of market
participants. Multi-agent Reinforcement learning (MARL) is a promising new
approach to predicting the expected behavior of energy market participants.
However, reinforcement learning requires many interactions with the system to
converge, and the power system environment often consists of extensive
computations, e.g., optimal power flow (OPF) calculation for market clearing.
To tackle this complexity, we provide a model of the energy market to a basic
MARL algorithm in the form of a learned OPF approximation and explicit market
rules. The learned OPF surrogate model makes an explicit solving of the OPF
completely unnecessary. Our experiments demonstrate that the model additionally
reduces training time by about one order of magnitude but at the cost of a
slightly worse approximation of the Nash equilibrium. Potential applications of
our method are market design, more realistic modeling of market participants,
and analysis of manipulative behavior.
( 2
min )
With the freight delivery demands and shipping costs increasing rapidly,
intelligent control of fleets to enable efficient and cost-conscious solutions
becomes an important problem. In this paper, we propose DeepFreight, a
model-free deep-reinforcement-learning-based algorithm for multi-transfer
freight delivery, which includes two closely-collaborative components:
truck-dispatch and package-matching. Specifically, a deep multi-agent
reinforcement learning framework called QMIX is leveraged to learn a dispatch
policy, with which we can obtain the multi-step joint vehicle dispatch
decisions for the fleet with respect to the delivery requests. Then an
efficient multi-transfer matching algorithm is executed to assign the delivery
requests to the trucks. Also, DeepFreight is integrated with a Mixed-Integer
Linear Programming optimizer for further optimization. The evaluation results
show that the proposed system is highly scalable and ensures a 100\% delivery
success while maintaining low delivery-time and fuel consumption. The codes are
available at https://github.com/LucasCJYSDL/DeepFreight.
( 2
min )
Unsupervised disentanglement is a long-standing challenge in representation
learning. Recently, self-supervised techniques achieved impressive results in
the sequential setting, where data is time-dependent. However, the latter
methods employ modality-based data augmentations and random sampling or solve
auxiliary tasks. In this work, we propose to avoid that by generating,
sampling, and comparing empirical distributions from the underlying variational
model. Unlike existing work, we introduce a self-supervised sequential
disentanglement framework based on contrastive estimation with no external
signals, while using common batch sizes and samples from the latent space
itself. In practice, we propose a unified, efficient, and easy-to-code sampling
strategy for semantically similar and dissimilar views of the data. We evaluate
our approach on video, audio, and time series benchmarks. Our method presents
state-of-the-art results in comparison to existing techniques. The code is
available at https://github.com/azencot-group/SPYL.
( 2
min )
Some of the most successful knowledge graph embedding (KGE) models for link
prediction -- CP, RESCAL, TuckER, ComplEx -- can be interpreted as energy-based
models. Under this perspective they are not amenable for exact
maximum-likelihood estimation (MLE), sampling and struggle to integrate logical
constraints. This work re-interprets the score functions of these KGEs as
circuits -- constrained computational graphs allowing efficient
marginalisation. Then, we design two recipes to obtain efficient generative
circuit models by either restricting their activations to be non-negative or
squaring their outputs. Our interpretation comes with little or no loss of
performance for link prediction, while the circuits framework unlocks exact
learning by MLE, efficient sampling of new triples, and guarantee that logical
constraints are satisfied by design. Furthermore, our models scale more
gracefully than the original KGEs on graphs with millions of entities.
( 2
min )
We consider a version of actor-critic which uses proportional step-sizes and
only one critic update with a single sample from the stationary distribution
per actor step. We provide an analysis of this method using the small-gain
theorem. Specifically, we prove that this method can be used to find a
stationary point, and that the resulting sample complexity improves the state
of the art for actor-critic methods to $O \left(\mu^{-2} \epsilon^{-2} \right)$
to find an $\epsilon$-approximate stationary point where $\mu$ is the condition
number associated with the critic.
( 2
min )
In recent years, numerous screening methods have been published for
ultrahigh-dimensional data that contain hundreds of thousands of features;
however, most of these features cannot handle data with thousands of classes.
Prediction models built to authenticate users based on multichannel biometric
data result in this type of problem. In this study, we present a novel method
known as random forest-based multiround screening (RFMS) that can be
effectively applied under such circumstances. The proposed algorithm divides
the feature space into small subsets and executes a series of partial model
builds. These partial models are used to implement tournament-based sorting and
the selection of features based on their importance. To benchmark RFMS, a
synthetic biometric feature space generator known as BiometricBlender is
employed. Based on the results, the RFMS is on par with industry-standard
feature screening methods while simultaneously possessing many advantages over
these methods.
( 2
min )
A crucial problem in reinforcement learning is learning the optimal policy.
We study this in tabular infinite-horizon discounted Markov decision processes
under the online setting. The existing algorithms either fail to achieve regret
optimality or have to incur a high memory and computational cost. In addition,
existing optimal algorithms all require a long burn-in time in order to achieve
optimal sample efficiency, i.e., their optimality is not guaranteed unless
sample size surpasses a high threshold. We address both open problems by
introducing a model-free algorithm that employs variance reduction and a novel
technique that switches the execution policy in a slow-yet-adaptive manner.
This is the first regret-optimal model-free algorithm in the discounted
setting, with the additional benefit of a low burn-in time.
( 2
min )
A key challenge for a reinforcement learning (RL) agent is to incorporate
external/expert1 advice in its learning. The desired goals of an algorithm that
can shape the learning of an RL agent with external advice include (a)
maintaining policy invariance; (b) accelerating the learning of the agent; and
(c) learning from arbitrary advice [3]. To address this challenge this paper
formulates the problem of incorporating external advice in RL as a multi-armed
bandit called shaping-bandits. The reward of each arm of shaping bandits
corresponds to the return obtained by following the expert or by following a
default RL algorithm learning on the true environment reward.We show that
directly applying existing bandit and shaping algorithms that do not reason
about the non-stationary nature of the underlying returns can lead to poor
results. Thus we propose UCB-PIES (UPIES), Racing-PIES (RPIES), and Lazy PIES
(LPIES) three different shaping algorithms built on different assumptions that
reason about the long-term consequences of following the expert policy or the
default RL algorithm. Our experiments in four different settings show that
these proposed algorithms achieve the above-mentioned goals whereas the other
algorithms fail to do so.
( 2
min )
Convolutional Neural Network (CNN) is one of the most important architectures
in deep learning. The fundamental building block of a CNN is a trainable
filter, represented as a discrete grid, used to perform convolution on discrete
input data. In this work, we propose a continuous version of a trainable
convolutional filter able to work also with unstructured data. This new
framework allows exploring CNNs beyond discrete domains, enlarging the usage of
this important learning technique for many more complex problems. Our
experiments show that the continuous filter can achieve a level of accuracy
comparable to the state-of-the-art discrete filter, and that it can be used in
current deep learning architectures as a building block to solve problems with
unstructured domains as well.
( 2
min )
Recently, Montasser et al. [2019] showed that finite VC dimension is not
sufficient for proper adversarially robust PAC learning. In light of this
hardness, there is a growing effort to study what type of relaxations to the
adversarially robust PAC learning setup can enable proper learnability. In this
work, we initiate the study of proper learning under relaxations of the
worst-case robust loss. We give a family of robust loss relaxations under which
VC classes are properly PAC learnable with sample complexity close to what one
would require in the standard PAC learning setup. On the other hand, we show
that for an existing and natural relaxation of the worst-case robust loss,
finite VC dimension is not sufficient for proper learning. Lastly, we give new
generalization guarantees for the adversarially robust empirical risk
minimizer.
( 2
min )
We investigate the convergence of stochastic mirror descent (SMD) under
interpolation in relatively smooth and smooth convex optimization. In
relatively smooth convex optimization we provide new convergence guarantees for
SMD with a constant stepsize. For smooth convex optimization we propose a new
adaptive stepsize scheme -- the mirror stochastic Polyak stepsize (mSPS).
Notably, our convergence results in both settings do not make bounded gradient
assumptions or bounded variance assumptions, and we show convergence to a
neighborhood that vanishes under interpolation. Consequently, these results
correspond to the first convergence guarantees under interpolation for the
exponentiated gradient algorithm for fixed or adaptive stepsizes. mSPS
generalizes the recently proposed stochastic Polyak stepsize (SPS) (Loizou et
al. 2021) to mirror descent and remains both practical and efficient for modern
machine learning applications while inheriting the benefits of mirror descent.
We complement our results with experiments across various supervised learning
tasks and different instances of SMD, demonstrating the effectiveness of mSPS.
( 2
min )
Continuous monitoring with an ever-increasing number of sensors has become
ubiquitous across many application domains. However, acquired time series are
typically high-dimensional and difficult to interpret. Expressive deep learning
(DL) models have gained popularity for dimensionality reduction, but the
resulting latent space often remains difficult to interpret. In this work we
propose SOM-CPC, a model that visualizes data in an organized 2D manifold,
while preserving higher-dimensional information. We address a largely
unexplored and challenging set of scenarios comprising high-rate time series,
and show on both synthetic and real-life data (physiological data and audio
recordings) that SOM-CPC outperforms strong baselines like DL-based feature
extraction, followed by conventional dimensionality reduction techniques, and
models that jointly optimize a DL model and a Self-Organizing Map (SOM).
SOM-CPC has great potential to acquire a better understanding of latent
patterns in high-rate data streams.
( 2
min )
Identifiability of latent variable models has recently gained interest in
terms of its applications to interpretability or out of distribution
generalisation. In this work, we study identifiability of Markov Switching
Models as a first step towards extending recent results to sequential latent
variable models. We present identifiability conditions within first-order
Markov dependency structures, and parametrise the transition distribution via
non-linear Gaussians. Our experiments showcase the applicability of our
approach for regime-dependent causal discovery and high-dimensional time series
segmentation.
( 2
min )
Masked Language Models (MLMs) have proven to be effective for second-pass
rescoring in Automatic Speech Recognition (ASR) systems. In this work, we
propose Masked Audio Text Encoder (MATE), a multi-modal masked language model
rescorer which incorporates acoustic representations into the input space of
MLM. We adopt contrastive learning for effectively aligning the modalities by
learning shared representations. We show that using a multi-modal rescorer is
beneficial for domain generalization of the ASR system when target domain data
is unavailable. MATE reduces word error rate (WER) by 4%-16% on in-domain, and
3%-7% on out-of-domain datasets, over the text-only baseline. Additionally,
with very limited amount of training data (0.8 hours), MATE achieves a WER
reduction of 8%-23% over the first-pass baseline.
( 2
min )
Community detection is an important problem in unsupervised learning. This
paper proposes to solve a projection matrix approximation problem with an
additional entrywise bounded constraint. Algorithmically, we introduce a new
differentiable convex penalty and derive an alternating direction method of
multipliers (ADMM) algorithm. Theoretically, we establish the convergence
properties of the proposed algorithm. Numerical experiments demonstrate the
superiority of our algorithm over its competitors, such as the semi-definite
relaxation method and spectral clustering.
( 2
min )
Annotating data for multi-label classification is prohibitively expensive
because every category of interest must be confirmed to be present or absent.
Recent work on single positive multi-label (SPML) learning shows that it is
possible to train effective multi-label classifiers using only one positive
label per image. However, the standard benchmarks for SPML are derived from
traditional multi-label classification datasets by retaining one positive label
for each training example (chosen uniformly at random) and discarding all other
labels. In realistic settings it is not likely that positive labels are chosen
uniformly at random. This work introduces protocols for studying label bias in
SPML and provides new empirical results.
( 2
min )
We study online influence maximization (OIM) under a new model of decreasing
cascade (DC). This model is a generalization of the independent cascade (IC)
model by considering the common phenomenon of market saturation. In DC, the
chance of an influence attempt being successful reduces with previous failures.
The effect is neglected by previous OIM works under IC and linear threshold
models. We propose the DC-UCB algorithm to solve this problem, which achieves a
regret bound of the same order as the state-of-the-art works on the IC model.
Extensive experiments on both synthetic and real datasets show the
effectiveness of our algorithm.
( 2
min )
Machine learning has been applied to the problem of X-ray diffraction phase
prediction with promising results. In this paper, we describe a method for
using machine learning to predict crystal structure phases from X-ray
diffraction data of transition metals and their oxides. We evaluate the
performance of our method and compare the variety of its settings. Our results
demonstrate that the proposed machine learning framework achieves competitive
performance. This demonstrates the potential for machine learning to
significantly impact the field of X-ray diffraction and crystal structure
determination. Open-source implementation:
https://github.com/maxnygma/NeuralXRD.
( 2
min )
I study a stochastic multi-arm bandit problem where rewards are subject to
adversarial corruption. I propose a novel attack strategy that manipulates a
learner employing the UCB algorithm into pulling some non-optimal target arm $T
- o(T)$ times with a cumulative cost that scales as $\widehat{O}(\sqrt{\log
T})$, where $T$ is the number of rounds. I also prove the first lower bound on
the cumulative attack cost. The lower bound matches the upper bound up to
$O(\log \log T)$ factors, showing the proposed attack strategy to be near
optimal.
( 2
min )
We examine the characteristic activation values of individual ReLU units in
neural networks. We refer to the corresponding set for such characteristic
activation values in the input space as the characteristic activation set of a
ReLU unit. We draw an explicit connection between the characteristic activation
set and learned features in ReLU networks. This connection leads to new
insights into why various neural network normalization techniques used in
modern deep learning architectures regularize and stabilize SGD optimization.
Utilizing these insights, we propose a geometric approach to parameterize ReLU
networks for improved feature learning. We empirically verify its usefulness
with less carefully chosen initialization schemes and larger learning rates. We
report improved optimization stability, faster convergence speed, and better
generalization performance.
( 2
min )
Machine learning systems such as large scale recommendation systems or
natural language processing systems are usually trained on billions of training
points and are associated with hundreds of billions or trillions of parameters.
Improving the learning process in such a way that both the training load is
reduced and the model accuracy improved is highly desired. In this paper we
take a first step toward solving this problem, studying influence functions
from the perspective of simplifying the computations they involve. We discuss
assumptions, under which influence computations can be performed on
significantly fewer parameters. We also demonstrate that the sign of the
influence value can indicate whether a training point is to memorize, as
opposed to generalize upon. For this purpose we formally define what
memorization means for a training point, as opposed to generalization. We
conclude that influence functions can be made practical, even for large scale
machine learning systems, and that influence values can be taken into account
by algorithms that selectively remove training points, as part of the learning
process.
( 2
min )
This paper delves into stochastic optimization problems that involve
Markovian noise. We present a unified approach for the theoretical analysis of
first-order gradient methods for stochastic optimization and variational
inequalities. Our approach covers scenarios for both non-convex and strongly
convex minimization problems. To achieve an optimal (linear) dependence on the
mixing time of the underlying noise sequence, we use the randomized batching
scheme, which is based on the multilevel Monte Carlo method. Moreover, our
technique allows us to eliminate the limiting assumptions of previous research
on Markov noise, such as the need for a bounded domain and uniformly bounded
stochastic gradients. Our extension to variational inequalities under Markovian
noise is original. Additionally, we provide lower bounds that match the oracle
complexity of our method in the case of strongly convex optimization problems.
( 2
min )
The $L_{2}$-regularized loss of Deep Linear Networks (DLNs) with more than
one hidden layers has multiple local minima, corresponding to matrices with
different ranks. In tasks such as matrix completion, the goal is to converge to
the local minimum with the smallest rank that still fits the training data.
While rank-underestimating minima can easily be avoided since they do not fit
the data, gradient descent might get stuck at rank-overestimating minima. We
show that with SGD, there is always a probability to jump from a higher rank
minimum to a lower rank one, but the probability of jumping back is zero. More
precisely, we define a sequence of sets $B_{1}\subset B_{2}\subset\cdots\subset
B_{R}$ so that $B_{r}$ contains all minima of rank $r$ or less (and not more)
that are absorbing for small enough ridge parameters $\lambda$ and learning
rates $\eta$: SGD has prob. 0 of leaving $B_{r}$, and from any starting point
there is a non-zero prob. for SGD to go in $B_{r}$.
( 2
min )
Identifiability of latent variable models has recently gained interest in
terms of its applications to interpretability or out of distribution
generalisation. In this work, we study identifiability of Markov Switching
Models as a first step towards extending recent results to sequential latent
variable models. We present identifiability conditions within first-order
Markov dependency structures, and parametrise the transition distribution via
non-linear Gaussians. Our experiments showcase the applicability of our
approach for regime-dependent causal discovery and high-dimensional time series
segmentation.
( 2
min )
We study the problem of approximate sampling from non-log-concave
distributions, e.g., Gaussian mixtures, which is often challenging even in low
dimensions due to their multimodality. We focus on performing this task via
Markov chain Monte Carlo (MCMC) methods derived from discretizations of the
overdamped Langevin diffusions, which are commonly known as Langevin Monte
Carlo algorithms. Furthermore, we are also interested in two nonsmooth cases
for which a large class of proximal MCMC methods have been developed: (i) a
nonsmooth prior is considered with a Gaussian mixture likelihood; (ii) a
Laplacian mixture distribution. Such nonsmooth and non-log-concave sampling
tasks arise from a wide range of applications to Bayesian inference and imaging
inverse problems such as image deconvolution. We perform numerical simulations
to compare the performance of most commonly used Langevin Monte Carlo
algorithms.
( 2
min )
Recently, Montasser et al. [2019] showed that finite VC dimension is not
sufficient for proper adversarially robust PAC learning. In light of this
hardness, there is a growing effort to study what type of relaxations to the
adversarially robust PAC learning setup can enable proper learnability. In this
work, we initiate the study of proper learning under relaxations of the
worst-case robust loss. We give a family of robust loss relaxations under which
VC classes are properly PAC learnable with sample complexity close to what one
would require in the standard PAC learning setup. On the other hand, we show
that for an existing and natural relaxation of the worst-case robust loss,
finite VC dimension is not sufficient for proper learning. Lastly, we give new
generalization guarantees for the adversarially robust empirical risk
minimizer.
( 2
min )
Generative Adversarial Networks (GANs) have shown immense potential in fields
far from physics, such as in text and image generation. Here we use GANs to
learn a prototypical stochastic process on a lattice. By suitably adding noise
to the original data we succeed in bringing both the Generator and the
Discriminator loss functions close to their ideal value. However, as typical
for adversarial approaches, oscillations persist. This undermines model
selection and the quality of the generated trajectory. We demonstrate that a
suitable multi-model procedure where stochastic trajectories are advanced at
each step upon randomly selecting a Generator leads to a remarkable increase in
accuracy. Based on the reported findings GANs appears as a promising tool to
tackle complex statistical dynamics.
( 2
min )
Multilabel ranking is a central task in machine learning. However, the most
fundamental question of learnability in a multilabel ranking setting with
relevance-score feedback remains unanswered. In this work, we characterize the
learnability of multilabel ranking problems in both batch and online settings
for a large family of ranking losses. Along the way, we give two equivalence
classes of ranking losses based on learnability that capture most, if not all,
losses used in practice.
( 2
min )
We apply the Hierarchical Autoregressive Neural (HAN) network sampling
algorithm to the two-dimensional $Q$-state Potts model and perform simulations
around the phase transition at $Q=12$. We quantify the performance of the
approach in the vicinity of the first-order phase transition and compare it
with that of the Wolff cluster algorithm. We find a significant improvement as
far as the statistical uncertainty is concerned at a similar numerical effort.
In order to efficiently train large neural networks we introduce the technique
of pre-training. It allows to train some neural networks using smaller system
sizes and then employing them as starting configurations for larger system
sizes. This is possible due to the recursive construction of our hierarchical
approach. Our results serve as a demonstration of the performance of the
hierarchical approach for systems exhibiting bimodal distributions.
Additionally, we provide estimates of the free energy and entropy in the
vicinity of the phase transition with statistical uncertainties of the order of
$10^{-7}$ for the former and $10^{-3}$ for the latter based on a statistics of
$10^6$ configurations.
( 2
min )
The approximation properties of infinitely wide shallow neural networks
heavily depend on the choice of the activation function. To understand this
influence, we study embeddings between Barron spaces with different activation
functions. These embeddings are proven by providing push-forward maps on the
measures $\mu$ used to represent functions $f$. An activation function of
particular interest is the rectified power unit ($\operatorname{RePU}$) given
by $\operatorname{RePU}_s(x)=\max(0,x)^s$. For many commonly used activation
functions, the well-known Taylor remainder theorem can be used to construct a
push-forward map, which allows us to prove the embedding of the associated
Barron space into a Barron space with a $\operatorname{RePU}$ as activation
function. Moreover, the Barron spaces associated with the
$\operatorname{RePU}_s$ have a hierarchical structure similar to the Sobolev
spaces $H^m$.
( 2
min )
We propose EB-TC$\varepsilon$, a novel sampling rule for $\varepsilon$-best
arm identification in stochastic bandits. It is the first instance of Top Two
algorithm analyzed for approximate best arm identification. EB-TC$\varepsilon$
is an *anytime* sampling rule that can therefore be employed without
modification for fixed confidence or fixed budget identification (without prior
knowledge of the budget). We provide three types of theoretical guarantees for
EB-TC$\varepsilon$. First, we prove bounds on its expected sample complexity in
the fixed confidence setting, notably showing its asymptotic optimality in
combination with an adaptive tuning of its exploration parameter. We complement
these findings with upper bounds on its probability of error at any time and
for any error parameter, which further yield upper bounds on its simple regret
at any time. Finally, we show through numerical simulations that
EB-TC$\varepsilon$ performs favorably compared to existing algorithms, in
different settings.
( 2
min )
The research explores the influence of preschool attendance (one year before
full-time school) on the development of children during their first year of
school. Using data collected by the Australian Early Development Census, the
findings show that areas with high proportions of preschool attendance tended
to have lower proportions of children with at least one developmental
vulnerability. Developmental vulnerablities include not being able to cope with
the school day (tired, hungry, low energy), unable to get along with others or
aggressive behaviour, trouble with reading/writing or numbers. These findings,
of course, vary by region. Using Data Analysis and Machine Learning, the
researchers were able to identify three distinct clusters within Queensland,
each characterised by different socio-demographic variables influencing the
relationship between preschool attendance and developmental vulnerability.
These analyses contribute to understanding regions with high vulnerability and
the potential need for tailored policies or investments
( 2
min )
Many two-sample network hypothesis testing methodologies operate under the
implicit assumption that the vertex correspondence across networks is a priori
known. In this paper, we consider the degradation of power in two-sample graph
hypothesis testing when there are misaligned/label-shuffled vertices across
networks. In the context of random dot product and stochastic block model
networks, we theoretically explore the power loss due to shuffling for a pair
of hypothesis tests based on Frobenius norm differences between estimated edge
probability matrices or between adjacency matrices. The loss in testing power
is further reinforced by numerous simulations and experiments, both in the
stochastic block model and in the random dot product graph model, where we
compare the power loss across multiple recently proposed tests in the
literature. Lastly, we demonstrate the impact that shuffling can have in
real-data testing in a pair of examples from neuroscience and from social
network analysis.
( 2
min )
We introduce Markov Neural Processes (MNPs), a new class of Stochastic
Processes (SPs) which are constructed by stacking sequences of neural
parameterised Markov transition operators in function space. We prove that
these Markov transition operators can preserve the exchangeability and
consistency of SPs. Therefore, the proposed iterative construction adds
substantial flexibility and expressivity to the original framework of Neural
Processes (NPs) without compromising consistency or adding restrictions. Our
experiments demonstrate clear advantages of MNPs over baseline models on a
variety of tasks.
( 2
min )
A crucial problem in reinforcement learning is learning the optimal policy.
We study this in tabular infinite-horizon discounted Markov decision processes
under the online setting. The existing algorithms either fail to achieve regret
optimality or have to incur a high memory and computational cost. In addition,
existing optimal algorithms all require a long burn-in time in order to achieve
optimal sample efficiency, i.e., their optimality is not guaranteed unless
sample size surpasses a high threshold. We address both open problems by
introducing a model-free algorithm that employs variance reduction and a novel
technique that switches the execution policy in a slow-yet-adaptive manner.
This is the first regret-optimal model-free algorithm in the discounted
setting, with the additional benefit of a low burn-in time.
( 2
min )
Text-to-image generation is a task in which a machine learning (ML) model generates an image from a textual description. The goal is to generate an image that closely matches the description, capturing the details and nuances of the text. This task is challenging because it requires the model to understand the semantics and syntax of […]
( 15
min )
AI Weirdness: the strange side of machine learning
( 2
min )
One of the most common applications of generative AI and large language models (LLMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. Amazon Lex provides the framework for building AI based chatbots. Pre-trained foundation models (FMs) perform well at natural language understanding (NLU) tasks such summarization, text generation and question […]
( 12
min )
Amazon Kendra is a highly accurate and intelligent search service that enables users to search unstructured and structured data using natural language processing (NLP) and advanced search algorithms. With Amazon Kendra, you can find relevant answers to your questions quickly, without sifting through documents. However, just enabling end-users to get the answers to their queries […]
( 10
min )
This post was co-authored by Brian Curry (Founder and Head of Products at OCX Cognition) and Sandhya MN (Data Science Lead at InfoGain) OCX Cognition is a San Francisco Bay Area-based startup, offering a commercial B2B software as a service (SaaS) product called Spectrum AI. Spectrum AI is a predictive (generative) CX analytics platform for […]
( 8
min )
Two years after he spoke at a conference detailing his ambitious vision for cooling tomorrow’s data centers, Ali Heydari and his team won a $5 million grant to go build it. It was the largest of 15 awards in May from the U.S. Department of Energy. The DoE program, called COOLERCHIPS, received more than 100 Read article >
( 6
min )
For about six years, AI has been an integral part of the artwork of Dominic Harris, a London-based digital artist who’s about to launch his biggest exhibition to date. “I use it for things like giving butterflies a natural sense of movement,” said Harris, whose typical canvas is an interactive computer display. Using a rack Read article >
( 6
min )
The machine-learning algorithm identified a compound that kills Acinetobacter baumannii, a bacterium that lurks in many hospital settings.
( 9
min )
It’s more important than ever for artificial intelligence to estimate how accurately it is explaining data.
( 8
min )
Data and its by-products dominate the world we live in. Smartphones and easy Internet access have increased this proliferation of data at a much higher rate than before. To make sense of this data and to use it for business advantage, companies analyze this huge amount of data to get insights. Such insights from text… Read More »Key benefits of using text visualizations for your business
The post Key benefits of using text visualizations for your business appeared first on Data Science Central.
( 22
min )
Cyber security experts face a tough challenge from the new type of quantum computers capable of easily breaking through security codes. Quantum computers, based on principles of quantum physics instead of standard electronic systems, are still nascent and do not have enough processing power to crack encryption keys. However, the experts at QDex Labs believe that the… Read More »Quantum resistant cryptography – bolstering cyber security against the threats posed by quantum computing
The post Quantum resistant cryptography – bolstering cyber security against the threats posed by quantum computing appeared first on Data Science Central.
( 19
min )
ChatGPT continues to revolutionize the way financial conversations are conducted, by providing its users with a fast and reliable tool for decision-making. The synergy between Bitcoin and ChatGPT is evident in how each technology enables the other to reach its full potential. Bitcoin provides an efficient payment system, while ChatGPT enhances conversational capabilities through natural… Read More »Exploring the Synergy between Bitcoin and ChatGPT: Empowering Financial Conversations
The post Exploring the Synergy between Bitcoin and ChatGPT: Empowering Financial Conversations appeared first on Data Science Central.
( 22
min )
There is no denying that Artificial Intelligence is revolutionizing the business landscape in almost every industry. With the advent of new possible applications and the ongoing process of improving existing ones, AI is opening up exciting opportunities for those ready to take them. One key trend in this industry is personalization and precision marketing, which… Read More »Personalization and precision marketing: Revenue streams in CPGs through AI
The post Personalization and precision marketing: Revenue streams in CPGs through AI appeared first on Data Science Central.
( 21
min )
Intelligent document processing (IDP) is a technology that automates the processing of high volumes of unstructured data, including text, images, and videos. IDP offers a significant improvement over manual methods and legacy optical character recognition (OCR) systems by addressing challenges such as cost, errors, low accuracy, and limited scalability, ultimately leading to better outcomes for […]
( 18
min )
In this three-part series, we present a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case. This solution rides on a more significant global wave of increasing mortgage fraud, which is worsening as more people present […]
( 8
min )
Today we are excited to announce that you can now perform batch transforms with Amazon SageMaker JumpStart large language models (LLMs) for Text2Text Generation. Batch transforms are useful in situations where the responses don’t need to be real time and therefore you can do inference in batch for large datasets in bulk. For batch transform, […]
( 12
min )
The GeForce RTX 4060 Ti 8GB GPU is now available from top add-in card providers including ASUS, Colorful, Galax, GIGABYTE, INNO3D, MSI, Palit, PNY and ZOTAC, as well as from system integrators and builders worldwide.
( 7
min )
When we start learning Python, many times, we come across bad practices. In this article, you will learn the best practices to take your…
( 16
min )
Announcements TLADS and the Socratic Method: Bill Schmarzo’s Excellent Adventure Frequent Data Science Central contributor Bill Schmarzo has long touted the “Think Like a Data Scientist” methodology for business decisions. Bill notes that when leaders (and employees) “TLADS,” it provides a framework for value-based problem-solving and data-driven decision-making. By incorporating business context, stakeholder alignment and… Read More »DSC Weekly 23 May 2023 – TLADS and the Socratic Method: Bill Schmarzo’s Excellent Adventure
The post DSC Weekly 23 May 2023 – TLADS and the Socratic Method: Bill Schmarzo’s Excellent Adventure appeared first on Data Science Central.
( 19
min )
The healthcare industry relies heavily on accurate claims auditing to ensure proper reimbursement and financial stability. Claims auditors must determine the correct party, membership eligibility, contractual adherence, and fraud, waste, and abuse to accurately pay to prepay and postpay healthcare claims. This is a difficult task with many obstacles. Healthcare reimbursement and financial stability depend… Read More »AI-Assisted Claims Auditing: Uncovering Errors Leading to Boosted Financial Recovery
The post AI-Assisted Claims Auditing: Uncovering Errors Leading to Boosted Financial Recovery appeared first on Data Science Central.
( 22
min )
By Jess Warrington, General Manager, North America, CloudBlue They say eCommerce is the new normal, but beyond simple selling, it has ushered in the next evolution of B2B transactions. Digital marketplaces enable tech vendors to broaden their reach and expand their catalog of products and services, giving companies the ability to package multiple types of… Read More »How Tech Vendors Can Embrace the Digital Marketplace Reset – Tips on navigating the digital marketplace-as-a-service landscape
The post How Tech Vendors Can Embrace the Digital Marketplace Reset – Tips on navigating the digital marketplace-as-a-service landscape appeared first on Data Science Central.
( 21
min )
Most of us agree that search is broken. It has not changed much in terms of user experience over the last two decades. To make matters worse, due to the SEO/ad driven focus, the results from search are often preceded by advertising. Gen Z has realised this and are using TikTok and other platforms as… Read More »LLM results in search – Google search perspectives and generative AI in search
The post LLM results in search – Google search perspectives and generative AI in search appeared first on Data Science Central.
( 19
min )
FAIR Data Forecast interview with Todd Carter “Most video assets are hugely underperforming,” Todd Carter, CTO of Resolute Square, said in our Personal Knowledge Graph working group interview with him. “I know you all are practitioners used to indexable metadata, but that’s not what we have here.” Resolute Square (RS) is a Public Benefit Corporation… Read More »Boosting video “surface area” for discoverability with knowledge graphs
The post Boosting video “surface area” for discoverability with knowledge graphs appeared first on Data Science Central.
( 20
min )
Smile, you are being watched. Over the past few years, facial recognition technology has captivated the world with its awe and apprehension. Everyone in the tech world knows about it, but few of us know what happens behind the scenes. Similar to celebrity gossip, everyone knows what happens behind the scenes regarding the latest celebrities,… Read More »The Future of Facial Recognition: Promoting Responsible Deployment and Ethical Practices
The post The Future of Facial Recognition: Promoting Responsible Deployment and Ethical Practices appeared first on Data Science Central.
( 22
min )
Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should […]
( 12
min )
This is a guest blog post co-written with Vik Pant and Kyle Bassett from PwC. With organizations increasingly investing in machine learning (ML), ML adoption has become an integral part of business transformation strategies. A recent PwC CEO survey unveiled that 84% of Canadian CEOs agree that artificial intelligence (AI) will significantly change their business […]
( 8
min )
The seeds of a machine learning (ML) paradigm shift have existed for decades, but with the ready availability of virtually infinite compute capacity, a massive proliferation of data, and the rapid advancement of ML technologies, customers across industries are rapidly adopting and using ML technologies to transform their businesses. Just recently, generative AI applications have […]
( 13
min )
Generative AI — in the form of large language model (LLM) applications like ChatGPT, image generators such as Stable Diffusion and Adobe Firefly, and game rendering techniques like NVIDIA DLSS 3 Frame Generation — is rapidly ushering in a new era of computing for productivity, content creation, gaming and more. At the Microsoft Build developer Read article >
( 7
min )
Robotics hardware traditionally requires programmers to deploy it. READY Robotics wants to change that with its “no code” software aimed at people working in manufacturing who haven’t got programming skills. The Columbus, Ohio, startup is a spinout of robotics research from Johns Hopkins University. Kel Guerin was a PhD candidate there leading this research when Read article >
( 6
min )
It’s time to take out the space trash. In this episode of the NVIDIA AI Podcast, host Noah Kravitz dives into an illuminating conversation with Alex Fielding, co-founder and CEO of Privateer Space. Fielding is a tech industry veteran, having previously worked alongside Apple co-founder Steve Wozniak on several projects, and holds a deep expertise Read article >
( 4
min )
TL;DR: Text Prompt -> LLM -> Intermediate Representation (such as an image layout) -> Stable Diffusion -> Image.
Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, despite their impressive capabilities, diffusion models, such as Stable Diffusion, often struggle to accurately follow the prompts when spatial or common sense reasoning is required.
The following figure lists four scenarios in which Stable Diffusion falls short in generating images that accurately correspond to the given prompts, namely negation, numeracy, and attribute assignment, spatial relationships. In contrast, our method, LLM-grounded Diffusion (LMD), delivers much better prompt understanding in text-to-image gen…
( 3
min )
This machine-learning method could assist with robotic scene understanding, image editing, or online recommendation systems.
( 10
min )
Introduction:
( 11
min )
Generative AI is in the midst of a period of stunning growth. Increasingly capable foundation models are being released continuously, with large language models (LLMs) being one of the most visible model classes. LLMs are models composed of billions of parameters trained on extensive corpora of text, up to hundreds of billions or even a […]
( 17
min )
People agree: accelerated computing is energy-efficient computing. The National Energy Research Scientific Computing Center (NERSC), the U.S. Department of Energy’s lead facility for open science, measured results across four of its key high performance computing and AI applications. They clocked how fast the applications ran and how much energy they consumed on CPU-only and GPU-accelerated Read article >
( 5
min )
In this work, we explore Parameter-Efficient-Learning (PEL) techniques to
repurpose a General-Purpose-Speech (GSM) model for Arabic dialect
identification (ADI). Specifically, we investigate different setups to
incorporate trainable features into a multi-layer encoder-decoder GSM
formulation under frozen pre-trained settings. Our architecture includes
residual adapter and model reprogramming (input-prompting). We design a
token-level label mapping to condition the GSM for Arabic Dialect
Identification (ADI). This is challenging due to the high variation in
vocabulary and pronunciation among the numerous regional dialects. We achieve
new state-of-the-art accuracy on the ADI-17 dataset by vanilla fine-tuning. We
further reduce the training budgets with the PEL method, which performs within
1.86% accuracy to fine-tuning using only 2.5% of (extra) network trainable
parameters. Our study demonstrates how to identify Arabic dialects using a
small dataset and limited computation with open source code and pre-trained
models.
( 2
min )
Machine learning (ML) has revolutionized transportation systems, enabling
autonomous driving and smart traffic services. Federated learning (FL)
overcomes privacy constraints by training ML models in distributed systems,
exchanging model parameters instead of raw data. However, the dynamic states of
connected vehicles affect the network connection quality and influence the FL
performance. To tackle this challenge, we propose a contextual client selection
pipeline that uses Vehicle-to-Everything (V2X) messages to select clients based
on the predicted communication latency. The pipeline includes: (i) fusing V2X
messages, (ii) predicting future traffic topology, (iii) pre-clustering clients
based on local data distribution similarity, and (iv) selecting clients with
minimal latency for future model aggregation. Experiments show that our
pipeline outperforms baselines on various datasets, particularly in non-iid
settings.
( 2
min )
Deep ensembles achieved state-of-the-art results in classification and
out-of-distribution (OOD) detection; however, their effectiveness remains
limited due to the homogeneity of learned patterns within the ensemble. To
overcome this challenge, our study introduces a novel approach that promotes
diversity among ensemble members by leveraging saliency maps. By incorporating
saliency map diversification, our method outperforms conventional ensemble
techniques in multiple classification and OOD detection tasks, while also
improving calibration. Experiments on well-established OpenOOD benchmarks
highlight the potential of our method in practical applications.
( 2
min )
The recent emergence of Self-Supervised Learning (SSL) as a fundamental
paradigm for learning image representations has, and continues to, demonstrate
high empirical success in a variety of tasks. However, most SSL approaches fail
to learn embeddings that capture hierarchical semantic concepts that are
separable and interpretable. In this work, we aim to learn highly separable
semantic hierarchical representations by stacking Joint Embedding Architectures
(JEA) where higher-level JEAs are input with representations of lower-level
JEA. This results in a representation space that exhibits distinct
sub-categories of semantic concepts (e.g., model and colour of vehicles) in
higher-level JEAs. We empirically show that representations from stacked JEA
perform on a similar level as traditional JEA with comparative parameter counts
and visualise the representation spaces to validate the semantic hierarchies.
( 2
min )
Neural machine translation (NMT) has become the de-facto standard in
real-world machine translation applications. However, NMT models can
unpredictably produce severely pathological translations, known as
hallucinations, that seriously undermine user trust. It becomes thus crucial to
implement effective preventive strategies to guarantee their proper
functioning. In this paper, we address the problem of hallucination detection
in NMT by following a simple intuition: as hallucinations are detached from the
source content, they exhibit encoder-decoder attention patterns that are
statistically different from those of good quality translations. We frame this
problem with an optimal transport formulation and propose a fully unsupervised,
plug-in detector that can be used with any attention-based NMT model.
Experimental results show that our detector not only outperforms all previous
model-based detectors, but is also competitive with detectors that employ large
models trained on millions of samples.
( 2
min )
Parkinson's disease (PD) is a neurological disorder impacting a person's
speech. Among automatic PD assessment methods, deep learning models have gained
particular interest. Recently, the community has explored cross-pathology and
cross-language models which can improve diagnostic accuracy even further.
However, strict patient data privacy regulations largely prevent institutions
from sharing patient speech data with each other. In this paper, we employ
federated learning (FL) for PD detection using speech signals from 3 real-world
language corpora of German, Spanish, and Czech, each from a separate
institution. Our results indicate that the FL model outperforms all the local
models in terms of diagnostic accuracy, while not performing very differently
from the model based on centrally combined training sets, with the advantage of
not requiring any data sharing among collaborators. This will simplify
inter-institutional collaborations, resulting in enhancement of patient
outcomes.
( 2
min )
We present a machine learning approach for the aftershock forecasting of
Japanese earthquake catalogue from 2015 to 2019. Our method takes as sole input
the ground surface deformation as measured by Global Positioning System (GPS)
stations at the day of the mainshock, and processes it with a Convolutional
Neural Network (CNN), thus capturing the input's spatial correlations. Despite
the moderate amount of data the performance of this new approach is very
promising. The accuracy of the prediction heavily relies on the density of GPS
stations: the predictive power is lost when the mainshocks occur far from
measurement stations, as in offshore regions.
( 2
min )
We describe a proof-of-principle implementation of a system for drawing
melodies that abstracts away from a note-level input representation via melodic
contours. The aim is to allow users to express their musical intentions without
requiring prior knowledge of how notes fit together melodiously. Current
approaches to controllable melody generation often require users to choose
parameters that are static across a whole sequence, via buttons or sliders. In
contrast, our method allows users to quickly specify how parameters should
change over time by drawing a contour.
( 2
min )
In this paper, we investigate the complexity of feed-forward neural networks
by examining the concept of functional equivalence, which suggests that
different network parameterizations can lead to the same function. We utilize
the permutation invariance property to derive a novel covering number bound for
the class of feedforward neural networks, which reveals that the complexity of
a neural network can be reduced by exploiting this property. Furthermore, based
on the symmetric structure of parameter space, we demonstrate that an
appropriate strategy of random parameter initialization can increase the
probability of convergence for optimization. We found that overparameterized
networks tend to be easier to train in the sense that increasing the width of
neural networks leads to a vanishing volume of the effective parameter space.
Our findings offer new insights into overparameterization and have significant
implications for understanding generalization and optimization in deep
learning.
( 2
min )
Identifying molecules that exhibit some pre-specified properties is a
difficult problem to solve. In the last few years, deep generative models have
been used for molecule generation. Deep Graph Variational Autoencoders are
among the most powerful machine learning tools with which it is possible to
address this problem. However, existing methods struggle in capturing the true
data distribution and tend to be computationally expensive. In this work, we
propose RGCVAE, an efficient and effective Graph Variational Autoencoder based
on: (i) an encoding network exploiting a new powerful Relational Graph
Isomorphism Network; (ii) a novel probabilistic decoding component. Compared to
several state-of-the-art VAE methods on two widely adopted datasets, RGCVAE
shows state-of-the-art molecule generation performance while being
significantly faster to train.
( 2
min )
This work presents a comprehensive analysis to regularize the Soft
Actor-Critic (SAC) algorithm with automatic temperature adjustment. The the
policy evaluation, the policy improvement and the temperature adjustment are
reformulated, addressing certain modification and enhancing the clarity of the
original theory in a more explicit manner.
( 2
min )
This paper presents a community-centered study of cultural limitations of
text-to-image (T2I) models in the South Asian context. We theorize these
failures using scholarship on dominant media regimes of representations and
locate them within participants' reporting of their existing social
marginalizations. We thus show how generative AI can reproduce an outsiders
gaze for viewing South Asian cultures, shaped by global and regional power
inequities. By centering communities as experts and soliciting their
perspectives on T2I limitations, our study adds rich nuance into existing
evaluative frameworks and deepens our understanding of the culturally-specific
ways AI technologies can fail in non-Western and Global South settings. We
distill lessons for responsible development of T2I models, recommending
concrete pathways forward that can allow for recognition of structural
inequalities.
( 2
min )
The recent rapid progress in pre-training Large Language Models has relied on
using self-supervised language modeling objectives like next token prediction
or span corruption. On the other hand, Machine Translation Systems are mostly
trained using cross-lingual supervision that requires aligned data between
source and target languages. We demonstrate that pre-training Large Language
Models on a mixture of a self-supervised Language Modeling objective and the
supervised Machine Translation objective, therefore including cross-lingual
parallel data during pre-training, yields models with better in-context
learning abilities. As pre-training is a very resource-intensive process and a
grid search on the best mixing ratio between the two objectives is
prohibitively expensive, we propose a simple yet effective strategy to learn it
during pre-training.
( 2
min )
In this paper, we study the statistical efficiency of Reinforcement Learning
in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general function
approximation. We introduce a new concept called Mean-Field Model-Based Eluder
Dimension (MBED), which subsumes a rich family of Mean-Field RL problems.
Additionally, we propose algorithms based on Optimistic Maximal Likelihood
Estimation, which can return an $\epsilon$-optimal policy for MFC or an
$\epsilon$-Nash Equilibrium policy for MFG, with sample complexity polynomial
w.r.t. relevant parameters and independent of the number of states, actions and
the number of agents. Notably, our results only require a mild assumption of
Lipschitz continuity on transition dynamics and avoid strong structural
assumptions in previous work. Finally, in the tabular setting, given the access
to a generative model, we establish an exponential lower bound for MFC setting,
while providing a novel sample-efficient model elimination algorithm to
approximate equilibrium in MFG setting. Our results reveal a fundamental
separation between RL for single-agent, MFC, and MFG from the sample efficiency
perspective.
( 2
min )
A Markov network characterizes the conditional independence structure, or
Markov property, among a set of random variables. Existing work focuses on
specific families of distributions (e.g., exponential families) and/or certain
structures of graphs, and most of them can only handle variables of a single
data type (continuous or discrete). In this work, we characterize the
conditional independence structure in general distributions for all data types
(i.e., continuous, discrete, and mixed-type) with a Generalized Precision
Matrix (GPM). Besides, we also allow general functional relations among
variables, thus giving rise to a Markov network structure learning algorithm in
one of the most general settings. To deal with the computational challenge of
the problem, especially for large graphs, we unify all cases under the same
umbrella of a regularized score matching framework. We validate the theoretical
results and demonstrate the scalability empirically in various settings.
( 2
min )
We consider the community recovery problem on a multilayer variant of the
hypergraph stochastic block model (HSBM). Each layer is associated with an
independent realization of a d-uniform HSBM on N vertices. Given the similarity
matrix containing the aggregated number of hyperedges incident to each pair of
vertices, the goal is to obtain a partition of the N vertices into disjoint
communities. In this work, we investigate a semidefinite programming (SDP)
approach and obtain information-theoretic conditions on the model parameters
that guarantee exact recovery both in the assortative and the disassortative
cases.
( 2
min )
In this paper, we study the statistical efficiency of Reinforcement Learning
in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general function
approximation. We introduce a new concept called Mean-Field Model-Based Eluder
Dimension (MBED), which subsumes a rich family of Mean-Field RL problems.
Additionally, we propose algorithms based on Optimistic Maximal Likelihood
Estimation, which can return an $\epsilon$-optimal policy for MFC or an
$\epsilon$-Nash Equilibrium policy for MFG, with sample complexity polynomial
w.r.t. relevant parameters and independent of the number of states, actions and
the number of agents. Notably, our results only require a mild assumption of
Lipschitz continuity on transition dynamics and avoid strong structural
assumptions in previous work. Finally, in the tabular setting, given the access
to a generative model, we establish an exponential lower bound for MFC setting,
while providing a novel sample-efficient model elimination algorithm to
approximate equilibrium in MFG setting. Our results reveal a fundamental
separation between RL for single-agent, MFC, and MFG from the sample efficiency
perspective.
( 2
min )
A fundamental problem of causal discovery is cause-effect inference, learning
the correct causal direction between two random variables. Significant progress
has been made through modelling the effect as a function of its cause and a
noise term, which allows us to leverage assumptions about the generating
function class. The recently introduced heteroscedastic location-scale noise
functional models (LSNMs) combine expressive power with identifiability
guarantees. LSNM model selection based on maximizing likelihood achieves
state-of-the-art accuracy, when the noise distributions are correctly
specified. However, through an extensive empirical evaluation, we demonstrate
that the accuracy deteriorates sharply when the form of the noise distribution
is misspecified by the user. Our analysis shows that the failure occurs mainly
when the conditional variance in the anti-causal direction is smaller than that
in the causal direction. As an alternative, we find that causal model selection
through residual independence testing is much more robust to noise
misspecification and misleading conditional variance.
( 2
min )
We propose a new multimodal variational autoencoder that enables to generate
from the joint distribution and conditionally to any number of complex
modalities. The unimodal posteriors are conditioned on the Deep Canonical
Correlation Analysis embeddings which preserve the shared information across
modalities leading to more coherent cross-modal generations. Furthermore, we
use Normalizing Flows to enrich the unimodal posteriors and achieve more
diverse data generation. Finally, we propose to use a Product of Experts for
inferring one modality from several others which makes the model scalable to
any number of modalities. We demonstrate that our method improves likelihood
estimates, diversity of the generations and in particular coherence metrics in
the conditional generations on several datasets.
( 2
min )
Senior Ananya Gurumurthy adds her musical talents to her math and computer science studies to advocate using data for social change.
( 9
min )
In this paper, we present a robust incremental learning model for regression
tasks on temporal tabular datasets. Using commonly available tabular and
time-series prediction models as building blocks, a machine-learning model is
built incrementally to adapt to distributional shifts in data. Using the
concept of self-similarity, the model uses only two basic building blocks of
machine learning models, gradient boosting decision trees and neural networks
to build models for any required complexity. The model is efficient as no
specialised neural architectures are used and each model building block can be
independently trained in parallel. The model is demonstrated to have robust
performances under adverse situations such as regime changes, fat-tailed
distributions and low signal-to-noise ratios. Model robustness are studied
under different hyper-parameters and complexities.
( 2
min )
If you are curious about the ethical considerations and debates surrounding AI-generated art, then this blog post is for you. I will be…
( 18
min )
The transportation and logistics industry has undergone a massive change with the introduction of artificial intelligence. After the…
( 13
min )
The banking sector is one of the most significant industries and is heavily dependent on technology to meet customer needs, build customer…
( 12
min )
Vision loss comes in various forms. For some, it’s from birth, for others, it’s a slow descent over time which comes with many expiration dates: The day you can’t see pictures, recognize yourself, or loved ones faces or even read your mail. In our previous blogpost Enable the Visually Impaired to Hear Documents using Amazon […]
( 9
min )
We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for
the countries of Brazil, Germany, India and Kenya, to aid training and
interpretability of models. We demonstrate how our lexicon can be used to
interpret model predictions, showing that models developed to classify extreme
speech rely heavily on target words when making predictions. Further, we
propose a method to aid shot selection for training in low-resource settings
via HATELEXICON. In few-shot learning, the selection of shots is of paramount
importance to model performance. In our work, we simulate a few-shot setting
for German and Hindi, using HASOC data for training and the Multilingual
HateCheck (MHC) as a benchmark. We show that selecting shots based on our
lexicon leads to models performing better on MHC than models trained on shots
sampled randomly. Thus, when given only a few training examples, using our
lexicon to select shots containing more sociocultural information leads to
better few-shot performance.
( 2
min )
Entropy measures are effective features for time series classification
problems. Traditional entropy measures, such as Shannon entropy, use
probability distribution function. However, for the effective separation of
time series, new entropy estimation methods are required to characterize the
chaotic dynamic of the system. Our concept of Neural Network Entropy (NNetEn)
is based on the classification of special datasets in relation to the entropy
of the time series recorded in the reservoir of the neural network. NNetEn
estimates the chaotic dynamics of time series in an original way and does not
take into account probability distribution functions. We propose two new
classification metrics: R2 Efficiency and Pearson Efficiency. The efficiency of
NNetEn is verified on separation of two chaotic time series of sine mapping
using dispersion analysis. For two close dynamic time series (r = 1.1918 and r
= 1.2243), the F-ratio has reached the value of 124 and reflects high
efficiency of the introduced method in classification problems. The
electroenceph-alography signal classification for healthy persons and patients
with Alzheimer disease illustrates the practical application of the NNetEn
features. Our computations demonstrate the synergistic effect of increasing
classification accuracy when applying traditional entropy measures and the
NNetEn concept conjointly. An implementation of the algorithms in Python is
presented.
( 3
min )
We study the gradients of a maxout network with respect to inputs and
parameters and obtain bounds for the moments depending on the architecture and
the parameter distribution. We observe that the distribution of the
input-output Jacobian depends on the input, which complicates a stable
parameter initialization. Based on the moments of the gradients, we formulate
parameter initialization strategies that avoid vanishing and exploding
gradients in wide networks. Experiments with deep fully-connected and
convolutional networks show that this strategy improves SGD and Adam training
of deep maxout networks. In addition, we obtain refined bounds on the expected
number of linear regions, results on the expected curve length distortion, and
results on the NTK.
( 2
min )
We prove Wasserstein inverse reinforcement learning enables the learner's
reward values to imitate the expert's reward values in a finite iteration for
multi-objective optimizations. Moreover, we prove Wasserstein inverse
reinforcement learning enables the learner's optimal solutions to imitate the
expert's optimal solutions for multi-objective optimizations with lexicographic
order.
( 2
min )
A solar sail is one of the most promising space exploration system because of
its theoretically infinite specific impulse using solar radiation pressure
(SRP). Recently, some researchers proposed "transformable spacecrafts" that can
actively reconfigure their body configurations with actuatable joints. The
transformable spacecrafts are expected to greatly enhance orbit and attitude
control capability due to its high redundancy in control degree of freedom if
they are used like solar sails. However, its large number of input poses
difficulties in control, and therefore, previous researchers imposed strong
constraints to limit its potential control capabilities. This paper addresses
novel attitude control techniques for the transformable spacecrafts under SRP.
The authors have constructed two proposed methods; one of those is a joint
angle optimization to acquire arbitrary SRP force and torque, and the other is
a momentum damping control driven by joint angle actuation. Our proposed
methods are formulated in general forms and applicable to any transformable
spacecraft that has front faces that can dominantly receive SRP on each body.
Validity of the proposed methods are confirmed by numerical simulations. This
paper contributes to making most of the high control redundancy of
transformable spacecrafts without consuming any expendable propellants, which
is expected to greatly enhance orbit and attitude control capability.
( 3
min )
Although deep neural network (DNN)-based speech enhancement (SE) methods
outperform the previous non-DNN-based ones, they often degrade the perceptual
quality of generated outputs. To tackle this problem, we introduce a DNN-based
generative refiner, Diffiner, aiming to improve perceptual speech quality
pre-processed by an SE method. We train a diffusion-based generative model by
utilizing a dataset consisting of clean speech only. Then, our refiner
effectively mixes clean parts newly generated via denoising diffusion
restoration into the degraded and distorted parts caused by a preceding SE
method, resulting in refined speech. Once our refiner is trained on a set of
clean speech, it can be applied to various SE methods without additional
training specialized for each SE module. Therefore, our refiner can be a
versatile post-processing module w.r.t. SE methods and has high potential in
terms of modularity. Experimental results show that our method improved
perceptual speech quality regardless of the preceding SE methods used.
( 2
min )
Soft actor-critic is a successful successor over soft Q-learning. While lived
under maximum entropy framework, their relationship is still unclear. In this
paper, we prove that in the limit they converge to the same solution. This is
appealing since it translates the optimization from an arduous to an easier
way. The same justification can also be applied to other regularizers such as
KL divergence.
( 2
min )
Contemporary face detection algorithms have to deal with many challenges such
as variations in pose, illumination, and scale. A subclass of the face
detection problem that has recently gained increasing attention is occluded
face detection, or more specifically, the detection of masked faces. Three
years on since the advent of the COVID-19 pandemic, there is still a complete
lack of evidence regarding how well existing face detection algorithms perform
on masked faces. This article first offers a brief review of state-of-the-art
face detectors and detectors made for the masked face problem, along with a
review of the existing masked face datasets. We evaluate and compare the
performances of a well-representative set of face detectors at masked face
detection and conclude with a discussion on the possible contributing factors
to their performance.
( 2
min )
Large Language Models (LLMs) have made remarkable strides in natural language
processing, but their expanding size poses challenges in terms of computational
expense and inefficiency. Conversely, Small Language Models (SLMs) are known
for their efficiency but often encounter difficulties in tasks with limited
capacity and training data, particularly in domain-specific scenarios. In this
paper, we introduce Dr. LLaMA, a method that improves SLMs in the medical
domain through generative data augmentation utilizing LLMs. The objective is to
develop more efficient and capable models tailored for specialized
applications. Our preliminary results on the PubMedQA dataset demonstrate that
LLMs effectively refine and diversify existing question-answer pairs, leading
to improved performance of a significantly smaller model after fine-tuning. The
best SLM surpasses few-shot GPT-4 with under 1.6 billion parameters on the
PubMedQA. Our code and generated data are publicly available to facilitate
further explorations.
( 2
min )
We propose the first black-box targeted attack against online deep
reinforcement learning through reward poisoning during training time. Our
attack is applicable to general environments with unknown dynamics learned by
unknown algorithms and requires limited attack budgets and computational
resources. We leverage a general framework and find conditions to ensure
efficient attack under a general assumption of the learning algorithms. We show
that our attack is optimal in our framework under the conditions. We
experimentally verify that with limited budgets, our attack efficiently leads
the learning agent to various target policies under a diverse set of popular
DRL environments and state-of-the-art learners.
( 2
min )
Vector embeddings have been successfully applied in several domains to obtain
effective representations of non-numeric data which can then be used in various
downstream tasks. We present a novel application of vector embeddings in
professional road cycling by demonstrating a method to learn representations
for riders and races based on historical results. We use unsupervised learning
techniques to validate that the resultant embeddings capture interesting
features of riders and races. These embeddings could be used for downstream
prediction tasks such as early talent identification and race outcome
prediction.
( 2
min )
We investigate a primal-dual (PD) method for the saddle point problem (SPP)
that uses a linear approximation of the primal function instead of the standard
proximal step, resulting in a linearized PD (LPD) method. For convex-strongly
concave SPP, we observe that the LPD method has a suboptimal dependence on the
Lipschitz constant of the primal function. To fix this issue, we combine
features of Accelerated Gradient Descent with the LPD method resulting in a
single-loop Accelerated Linearized Primal-Dual (ALPD) method. ALPD method
achieves the optimal gradient complexity when the SPP has a semi-linear
coupling function. We also present an inexact ALPD method for SPPs with a
general nonlinear coupling function that maintains the optimal gradient
evaluations of the primal parts and significantly improves the gradient
evaluations of the coupling term compared to the ALPD method. We verify our
findings with numerical experiments.
( 2
min )
Uncertainty-quantification methods are applied to estimate the confidence of
deep-neural-networks classifiers over their predictions. However, most widely
used methods are known to be overconfident. We address this problem by
developing an algorithm that exploits the latent-space representation of data
points fed into the network, to assess the accuracy of their prediction. Using
the latent-space representation generated by the fraction of training set that
the network classifies correctly, we build a statistical model that is able to
capture the likelihood of a given prediction. We show on a synthetic dataset
that commonly used methods are mostly overconfident. Overconfidence occurs also
for predictions made on data points that are outside the distribution that
generated the training data. In contrast, our method can detect such
out-of-distribution data points as inaccurately predicted, thus aiding in the
automatic detection of outliers.
( 2
min )
We address the problem of denoising data from a Gaussian mixture using a
two-layer non-linear autoencoder with tied weights and a skip connection. We
consider the high-dimensional limit where the number of training samples and
the input dimension jointly tend to infinity while the number of hidden units
remains bounded. We provide closed-form expressions for the denoising
mean-squared test error. Building on this result, we quantitatively
characterize the advantage of the considered architecture over the autoencoder
without the skip connection that relates closely to principal component
analysis. We further show that our results accurately capture the learning
curves on a range of real data sets.
( 2
min )
There have thousands of crimes are happening daily all around. But people
keep statistics only few of them, therefore crime rates are increasing day by
day. The reason behind can be less concern or less statistics of previous
crimes. It is much more important to observe the previous crime statistics for
general people to make their outing decision and police for catching the
criminals are taking steps to restrain the crimes and tourists to make their
travelling decision. National institute of justice releases crime survey data
for the country, but does not offer crime statistics up to Union or Thana
level. Considering all of these cases we have come up with an approach which
can give an approximation to people about the safety of a specific location
with crime ranking of different areas locating the crimes on a map including a
future crime occurrence prediction mechanism. Our approach relies on different
online Bangla newspapers for crawling the crime data, stemming and keyword
extraction, location finding algorithm, cosine similarity, naive Bayes
classifier, and a custom crime prediction model
( 2
min )
We investigate how sparse neural activity affects the generalization
performance of a deep Bayesian neural network at the large width limit. To this
end, we derive a neural network Gaussian Process (NNGP) kernel with rectified
linear unit (ReLU) activation and a predetermined fraction of active neurons.
Using the NNGP kernel, we observe that the sparser networks outperform the
non-sparse networks at shallow depths on a variety of datasets. We validate
this observation by extending the existing theory on the generalization error
of kernel-ridge regression.
( 2
min )
Dynamic control flow is an important technique often used to design
expressive and efficient deep learning computations for applications such as
text parsing, machine translation, exiting early out of deep models and so on.
However, the resulting control flow divergence makes batching, an important
performance optimization, difficult to perform manually. In this paper, we
present ACRoBat, a framework that enables efficient automatic batching for
dynamic deep learning computations by performing hybrid static+dynamic compiler
optimizations and end-to-end tensor code generation. ACRoBat performs up to
8.5X better than DyNet, a state-of-the-art framework for automatic batching, on
an Nvidia GeForce RTX 3070 GPU.
( 2
min )
The core purpose of developing artificial neural networks was to mimic the
functionalities of biological neural networks. However, unlike biological
neural networks, traditional artificial neural networks are often structured
hierarchically, which can impede the flow of information between neurons as the
neurons in the same layer have no connections between them. Hence, we propose a
more robust model of artificial neural networks where the hidden neurons,
residing in the same hidden layer, are interconnected, enabling the neurons to
learn complex patterns and speeding up the convergence rate. With the
experimental study of our proposed model as fully connected layers in shallow
and deep networks, we demonstrate that the model results in a significant
increase in convergence rate.
( 2
min )
Most phenomena related to biomedical tasks are inherently complex, and in
many cases, are expressed as signals on biomedical Knowledge Graphs (KGs). In
this work, we introduce the use of a new representation framework, the Prime
Adjacency Matrix (PAM) for biomedical KGs, which allows for very efficient
network analysis. PAM utilizes prime numbers to enable representing the whole
KG with a single adjacency matrix and the fast computation of multiple
properties of the network. We illustrate the applicability of the framework in
the biomedical domain by working on different biomedical knowledge graphs and
by providing two case studies: one on drug-repurposing for COVID-19 and one on
important metapath extraction. We show that we achieve better results than the
original proposed workflows, using very simple methods that require no
training, in considerably less time.
( 2
min )
Off-Policy reinforcement learning has been a driving force for the
state-of-the-art conversational AIs leading to more natural humanagent
interactions and improving the user satisfaction for goal-oriented agents.
However, in large-scale commercial settings, it is often challenging to balance
between policy improvements and experience continuity on the broad spectrum of
applications handled by such system. In the literature, off-policy evaluation
and guard-railing on aggregate statistics has been commonly used to address
this problem. In this paper, we propose a method for curating and leveraging
high-precision samples sourced from historical regression incident reports to
validate, safe-guard, and improve policies prior to the online deployment. We
conducted extensive experiments using data from a real-world conversational
system and actual regression incidents. The proposed method is currently
deployed in our production system to protect customers against broken
experiences and enable long-term policy improvements.
( 2
min )
We prove a universal approximation property (UAP) for a class of ODENet and a
class of ResNet, which are simplified mathematical models for deep learning
systems with skip connections. The UAP can be stated as follows. Let $n$ and
$m$ be the dimension of input and output data, and assume $m\leq n$. Then we
show that ODENet of width $n+m$ with any non-polynomial continuous activation
function can approximate any continuous function on a compact subset on
$\mathbb{R}^n$. We also show that ResNet has the same property as the depth
tends to infinity. Furthermore, we derive the gradient of a loss function
explicitly with respect to a certain tuning variable. We use this to construct
a learning algorithm for ODENet. To demonstrate the usefulness of this
algorithm, we apply it to a regression problem, a binary classification, and a
multinomial classification in MNIST.
( 2
min )
We study the gradients of a maxout network with respect to inputs and
parameters and obtain bounds for the moments depending on the architecture and
the parameter distribution. We observe that the distribution of the
input-output Jacobian depends on the input, which complicates a stable
parameter initialization. Based on the moments of the gradients, we formulate
parameter initialization strategies that avoid vanishing and exploding
gradients in wide networks. Experiments with deep fully-connected and
convolutional networks show that this strategy improves SGD and Adam training
of deep maxout networks. In addition, we obtain refined bounds on the expected
number of linear regions, results on the expected curve length distortion, and
results on the NTK.
( 2
min )
Unifying semi-supervised learning (SSL) and open-set recognition into a
single learning policy would facilitate the development of cost-efficient and
application-grade classifiers. However, previous attempts do not clarify the
difference between unobserved novel categories (those only seen during testing)
and observed novel categories (those present in unlabelled training data). This
study introduces Open-Set Learning with Augmented Category by Exploiting
Unlabelled Data (Open-LACU), the first policy that generalises between both
novel category types. We adapt the state-of-the-art OSR method of Margin
Generative Adversarial Networks (Margin-GANs) into several Open-LACU
configurations, setting the benchmarks for Open-LACU and offering unique
insights into novelty detection using Margin-GANs. Finally, we highlight the
significance of the Open-LACU policy by discussing the applications of semantic
segmentation in remote sensing, object detection in radiology and disease
identification through cough analysis. These applications include observed and
unobserved novel categories, making Open-LACU essential for training
classifiers in these big data domains.
( 2
min )
Considering the case where the response variable is a categorical variable
and the predictor is a random function, two novel functional sufficient
dimensional reduction (FSDR) methods are proposed based on mutual information
and square loss mutual information. Compared to the classical FSDR methods,
such as functional sliced inverse regression and functional sliced average
variance estimation, the proposed methods are appealing because they are
capable of estimating multiple effective dimension reduction directions in the
case of a relatively small number of categories, especially for the binary
response. Moreover, the proposed methods do not require the restrictive linear
conditional mean assumption and the constant covariance assumption. They avoid
the inverse problem of the covariance operator which is often encountered in
the functional sufficient dimension reduction. The functional principal
component analysis with truncation be used as a regularization mechanism. Under
some mild conditions, the statistical consistency of the proposed methods is
established. It is demonstrated that the two methods are competitive compared
with some existing FSDR methods by simulations and real data analyses.
( 2
min )
We focus on the task of learning a single index model $\sigma(w^\star \cdot
x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions.
Prior work has shown that the sample complexity of learning $w^\star$ is
governed by the information exponent $k^\star$ of the link function $\sigma$,
which is defined as the index of the first nonzero Hermite coefficient of
$\sigma$. Ben Arous et al. (2021) showed that $n \gtrsim d^{k^\star-1}$ samples
suffice for learning $w^\star$ and that this is tight for online SGD. However,
the CSQ lower bound for gradient based methods only shows that $n \gtrsim
d^{k^\star/2}$ samples are necessary. In this work, we close the gap between
the upper and lower bounds by showing that online SGD on a smoothed loss learns
$w^\star$ with $n \gtrsim d^{k^\star/2}$ samples. We also draw connections to
statistical analyses of tensor PCA and to the implicit regularization effects
of minibatch SGD on empirical losses.
( 2
min )
We address the problem of denoising data from a Gaussian mixture using a
two-layer non-linear autoencoder with tied weights and a skip connection. We
consider the high-dimensional limit where the number of training samples and
the input dimension jointly tend to infinity while the number of hidden units
remains bounded. We provide closed-form expressions for the denoising
mean-squared test error. Building on this result, we quantitatively
characterize the advantage of the considered architecture over the autoencoder
without the skip connection that relates closely to principal component
analysis. We further show that our results accurately capture the learning
curves on a range of real data sets.
( 2
min )
Following up on a previous analysis of graph embeddings, we generalize and
expand some results to the general setting of vector symbolic architectures
(VSA) and hyperdimensional computing (HDC). Importantly, we explore the
mathematical relationship between superposition, orthogonality, and tensor
product. We establish the tensor product representation as the central
representation, with a suite of unique properties. These include it being the
most general and expressive representation, as well as being the most
compressed representation that has errorrless unbinding and detection.
( 2
min )
If you’re interested in stateful stream processing and the capabilities it provides, you may have heard of Apache Flink®. It’s well-known for its ability to perform stateful stream processing, but for beginners, it can be a daunting task to get started. So here, we’ll explore the basics of Apache Flink by showing you how to… Read More »Getting Started with Apache Flink: First steps to Stateful Stream Processing
The post Getting Started with Apache Flink: First steps to Stateful Stream Processing appeared first on Data Science Central.
( 22
min )
Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. Amazon Kendra can pull together […]
( 9
min )
Microsoft Editor provides AI-powered writing assistance to millions of users around the world. One of its features that writers of all levels and domains rely on is the grammar checker, which detects grammar errors in a user’s writing and offers suggested corrections and explanations of the detected errors. The technology behind grammar checker has evolved […]
The post Achieving Zero-COGS with Microsoft Editor Neural Grammar Checker appeared first on Microsoft Research.
( 17
min )
Scientific researchers need massive computational resources that can support exploration wherever it happens. Whether they’re conducting groundbreaking pharmaceutical research, exploring alternative energy sources or discovering new ways to prevent financial fraud, accessible state-of-the-art AI computing resources are key to driving innovation. This new model of computing can solve the challenges of generative AI and power Read article >
( 5
min )
The GeForce RTX 4060 family will be available starting next week, bringing massive creator benefits to the popular 60-class GPUs.
( 9
min )
With the artificial intelligence conversation now mainstream, the 2023 MIT-MGB AI Cures conference saw attendance double from previous years.
( 8
min )
AWS delivers services that meet customers’ artificial intelligence (AI) and machine learning (ML) needs with services ranging from custom hardware like AWS Trainium and AWS Inferentia to generative AI foundation models (FMs) on Amazon Bedrock. In February 2022, AWS and Hugging Face announced a collaboration to make generative AI more accessible and cost efficient. Generative […]
( 7
min )
This post is co-written with Thatcher Thornberry from bpx energy. Facies classification is the process of segmenting lithologic formations from geologic data at the wellbore location. During drilling, wireline logs are obtained, which have depth-dependent geologic information. Geologists are deployed to analyze this log data and determine depth ranges for potential facies of interest from […]
( 11
min )
An update to the Omniverse Connector for Adobe Substance 3D Painter will save 3D creators across industries significant time and effort.
( 6
min )
In my last project, I had to come up with some code and algorithm to solve an interesting problem. I realized that it could lead to some off-the-beaten-path job interview question. The problem is a fundamental one. The level ranges from elementary school to one of the most difficult unsolved problems of all times, depending… Read More »An Intriguing Job Interview Question for AI/ML Professionals
The post An Intriguing Job Interview Question for AI/ML Professionals appeared first on Data Science Central.
( 21
min )
NASA's Kepler Space Telescope has been instrumental in the task of finding
the presence of exoplanets in our galaxy. This search has been supported by
computational data analysis to identify exoplanets from the signals received by
the Kepler telescope. In this paper, we consider building upon some existing
work on exoplanet identification using residual networks for the data of the
Kepler space telescope and its extended mission K2. This paper aims to explore
how deep learning algorithms can help in classifying the presence of exoplanets
with less amount of data in one case and a more extensive variety of data in
another. In addition to the standard CNN-based method, we propose a Siamese
architecture that is particularly useful in addressing classification in a
low-data scenario. The CNN and ResNet algorithms achieved an average accuracy
of 68% for three classes and 86% for two-class classification. However, for
both the three and two classes, the Siamese algorithm achieved 99% accuracy.
( 2
min )
Deep Learning (DL) has the potential to optimize machine learning in both the
scientific and clinical communities. However, greater expertise is required to
develop DL algorithms, and the variability of implementations hinders their
reproducibility, translation, and deployment. Here we present the
community-driven Generally Nuanced Deep Learning Framework (GaNDLF), with the
goal of lowering these barriers. GaNDLF makes the mechanism of DL development,
training, and inference more stable, reproducible, interpretable, and scalable,
without requiring an extensive technical background. GaNDLF aims to provide an
end-to-end solution for all DL-related tasks in computational precision
medicine. We demonstrate the ability of GaNDLF to analyze both radiology and
histology images, with built-in support for k-fold cross-validation, data
augmentation, multiple modalities and output classes. Our quantitative
performance evaluation on numerous use cases, anatomies, and computational
tasks supports GaNDLF as a robust application framework for deployment in
clinical workflows.
( 3
min )
Over-approximating the reachable sets of dynamical systems is a fundamental
problem in safety verification and robust control synthesis. The representation
of these sets is a key factor that affects the computational complexity and the
approximation error. In this paper, we develop a new approach for
over-approximating the reachable sets of neural network dynamical systems using
adaptive template polytopes. We use the singular value decomposition of linear
layers along with the shape of the activation functions to adapt the geometry
of the polytopes at each time step to the geometry of the true reachable sets.
We then propose a branch-and-bound method to compute accurate
over-approximations of the reachable sets by the inferred templates. We
illustrate the utility of the proposed approach in the reachability analysis of
linear systems driven by neural network controllers.
( 2
min )
A common pipeline in learning-based control is to iteratively estimate a
model of system dynamics, and apply a trajectory optimization algorithm -
e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This
paper conducts a rigorous analysis of a simplified variant of this strategy for
general nonlinear systems. We analyze an algorithm which iterates between
estimating local linear models of nonlinear system dynamics and performing
$\mathtt{iLQR}$-like policy updates. We demonstrate that this algorithm attains
sample complexity polynomial in relevant problem parameters, and, by
synthesizing locally stabilizing gains, overcomes exponential dependence in
problem horizon. Experimental results validate the performance of our
algorithm, and compare to natural deep-learning baselines.
( 2
min )
While Generative Adversarial Networks (GANs) have recently found applications
in image editing, most previous GAN-based image editing methods require
largescale datasets with semantic segmentation annotations for training, only
provide high level control, or merely interpolate between different images.
Previous researchers have proposed EditGAN for high-quality, high-precision
semantic image editing with limited semantic annotations by finding `editing
vectors'. However, it is noticed that there are many features that are not
highly associated with semantics, and EditGAN may fail on them. Based on the
orthogonality of latent space observed by EditGAN, we propose a method to
estimate editing vectors that do not rely on semantic segmentation nor
differentiable feature estimation network. Our method assumes that there is a
correlation between the intensity distribution of features and the distribution
of hidden vectors, and estimates the relationship between the above
distributions by sampling the feature intensity of the image corresponding to
several hidden vectors. We modified Linear Discriminant Analysis (LDA) to deal
with both binary feature editing and continuous feature editing. We then found
that this method has a good effect in processing features such as clothing type
and texture, skin color and hair.
( 2
min )
The article presents the torchosr package - a Python package compatible with
PyTorch library - offering tools and methods dedicated to Open Set Recognition
in Deep Neural Networks. The package offers two state-of-the-art methods in the
field, a set of functions for handling base sets and generation of derived sets
for the Open Set Recognition task (where some classes are considered unknown
and used only in the testing process) and additional tools to handle datasets
and methods. The main goal of the package proposal is to simplify and promote
the correct experimental evaluation, where experiments are carried out on a
large number of derivative sets with various Openness and class-to-category
assignments. The authors hope that state-of-the-art methods available in the
package will become a source of a correct and open-source implementation of the
relevant solutions in the domain.
( 2
min )
Dynamical mean-field theory is a powerful physics tool used to analyze the
typical behavior of neural networks, where neurons can be recurrently
connected, or multiple layers of neurons can be stacked. However, it is not
easy for beginners to access the essence of this tool and the underlying
physics. Here, we give a pedagogical introduction of this method in a
particular example of generic random neural networks, where neurons are
randomly and fully connected by correlated synapses and therefore the network
exhibits rich emergent collective dynamics. We also review related past and
recent important works applying this tool. In addition, a physically
transparent and alternative method, namely the dynamical cavity method, is also
introduced to derive exactly the same results. The numerical implementation of
solving the integro-differential mean-field equations is also detailed, with an
illustration of exploring the fluctuation dissipation theorem.
( 2
min )
Denoising diffusion models are a class of generative models which have
recently achieved state-of-the-art results across many domains. Gradual noise
is added to the data using a diffusion process, which transforms the data
distribution into a Gaussian. Samples from the generative model are then
obtained by simulating an approximation of the time reversal of this diffusion
initialized by Gaussian samples. Recent research has explored adapting
diffusion models for sampling and inference tasks. In this paper, we leverage
known connections to stochastic control akin to the F\"ollmer drift to extend
established neural network approximation results for the F\"ollmer drift to
denoising diffusion models and samplers.
( 2
min )
Localizing behaviors of neural networks to a subset of the network's
components or a subset of interactions between components is a natural first
step towards analyzing network mechanisms and possible failure modes. Existing
work is often qualitative and ad-hoc, and there is no consensus on the
appropriate way to evaluate localization claims. We introduce path patching, a
technique for expressing and quantitatively testing a natural class of
hypotheses expressing that behaviors are localized to a set of paths. We refine
an explanation of induction heads, characterize a behavior of GPT-2, and open
source a framework for efficiently running similar experiments.
( 2
min )
The concept of neuromuscular activity recognition using instantaneous
high-density surface electromyography (HD-sEMG) images opens up new avenues for
the development of more fluid and natural muscle-computer interfaces. However,
the existing approaches employed a very large deep convolutional neural network
(ConvNet) architecture and complex training schemes for HD-sEMG image
recognition, which requires the network architecture to be pre-trained on a
very large-scale labeled training dataset, as a result, it makes
computationally very expensive. To overcome this problem, we propose S-ConvNet
and All-ConvNet models, a simple yet efficient framework for learning
instantaneous HD-sEMG images from scratch for neuromuscular activity
recognition. Without using any pre-trained models, our proposed S-ConvNet and
All-ConvNet demonstrate very competitive recognition accuracy to the more
complex state of the art for neuromuscular activity recognition based on
instantaneous HD-sEMG images, while using a ~ 12 x smaller dataset and reducing
learning parameters to a large extent. The experimental results proved that the
S-ConvNet and All-ConvNet are highly effective for learning discriminative
features for instantaneous HD-sEMG image recognition especially in the data and
high-end resource constrained scenarios.
( 2
min )
Digital transformation in buildings accumulates massive operational data,
which calls for smart solutions to utilize these data to improve energy
performance. This study has proposed a solution, namely Deep Energy Twin, for
integrating deep learning and digital twins to better understand building
energy use and identify the potential for improving energy efficiency. Ontology
was adopted to create parametric digital twins to provide consistency of data
format across different systems in a building. Based on created digital twins
and collected data, deep learning methods were used for performing data
analytics to identify patterns and provide insights for energy optimization. As
a demonstration, a case study was conducted in a public historic building in
Norrk\"oping, Sweden, to compare the performance of state-of-the-art deep
learning architectures in building energy forecasting.
( 2
min )
In manufacturing settings, data collection and analysis are often a
time-consuming, challenging, and costly process. It also hinders the use of
advanced machine learning and data-driven methods which require a substantial
amount of offline training data to generate good results. It is particularly
challenging for small manufacturers who do not share the resources of a large
enterprise. Recently, with the introduction of the Internet of Things (IoT),
data can be collected in an integrated manner across the factory in real-time,
sent to the cloud for advanced analysis, and used to update the machine
learning model sequentially. Nevertheless, small manufacturers face two
obstacles in reaping the benefits of IoT: they may be unable to afford or
generate enough data to operate a private cloud, and they may be hesitant to
share their raw data with a public cloud. Federated learning (FL) is an
emerging concept of collaborative learning that can help small-scale industries
address these issues and learn from each other without sacrificing their
privacy. It can bring together diverse and geographically dispersed
manufacturers under the same analytics umbrella to create a win-win situation.
However, the widespread adoption of FL across multiple manufacturing
organizations remains a significant challenge. This study aims to review the
challenges and future directions of applying federated learning in the
manufacturing industry, with a specific emphasis on the perspectives of
Industry 4.0 and 5.0.
( 3
min )
In this work we introduce a self-supervised pretraining framework for
transformers on functional Magnetic Resonance Imaging (fMRI) data. First, we
pretrain our architecture on two self-supervised tasks simultaneously to teach
the model a general understanding of the temporal and spatial dynamics of human
auditory cortex during music listening. Our pretraining results are the first
to suggest a synergistic effect of multitask training on fMRI data. Second, we
finetune the pretrained models and train additional fresh models on a
supervised fMRI classification task. We observe significantly improved accuracy
on held-out runs with the finetuned models, which demonstrates the ability of
our pretraining tasks to facilitate transfer learning. This work contributes to
the growing body of literature on transformer architectures for pretraining and
transfer learning with fMRI data, and serves as a proof of concept for our
pretraining tasks and multitask pretraining on fMRI data.
( 2
min )
"Data is the new oil", in short, data would be the essential source of the
ongoing fourth industrial revolution, which has led some commentators to
assimilate too quickly the quantity of data to a source of wealth in itself,
and consider the development of big data as an quasi direct cause of profit.
Human resources management is not escaping this trend, and the accumulation of
large amounts of data on employees is perceived by some entrepreneurs as a
necessary and sufficient condition for the construction of predictive models of
complex work behaviors such as absenteeism or job performance. In fact, the
analogy is somewhat misleading: unlike oil, there are no major issues here
concerning the production of data (whose flows are generated continuously and
at low cost by various information …
( 3
min )
The detection of malicious websites has become a critical issue in
cybersecurity. Therefore, this paper offers a comprehensive review of
data-driven methods for detecting malicious websites. Traditional approaches
and their limitations are discussed, followed by an overview of data-driven
approaches. The paper establishes the data-feature-model-extension pipeline and
the latest research developments of data-driven approaches, including data
preprocessing, feature extraction, model construction and technology extension.
Specifically, this paper compares methods using deep learning models proposed
in recent years. Furthermore, the paper follows the
data-feature-model-extension pipeline to discuss the challenges together with
some future directions of data-driven methods in malicious website detection.
( 2
min )
Backpropagation (BP) is the most important gradient estimation method for
training neural networks in deep learning. However, the literature shows that
neural networks trained by BP are vulnerable to adversarial attacks. We develop
the likelihood ratio (LR) method, a new gradient estimation method, for
training a broad range of neural network architectures, including convolutional
neural networks, recurrent neural networks, graph neural networks, and spiking
neural networks, without recursive gradient computation. We propose three
methods to efficiently reduce the variance of the gradient estimation in the
neural network training process. Our experiments yield numerical results for
training different neural networks on several datasets. All results demonstrate
that the LR method is effective for training various neural networks and
significantly improves the robustness of the neural networks under adversarial
attacks relative to the BP method.
( 2
min )
Neurosymbolic AI deals with models that combine symbolic processing, like
classic AI, and neural networks, as it's a very established area. These models
are emerging as an effort toward Artificial General Intelligence (AGI) by both
exploring an alternative to just increasing datasets' and models' sizes and
combining Learning over the data distribution, Reasoning on prior and learned
knowledge, and by symbiotically using them. This survey investigates research
papers in this area during recent years and brings classification and
comparison between the presented models as well as applications.
( 2
min )
In this work, we propose a deep learning (DL)-based constitutive model for
investigating the cyclic viscoelastic-viscoplastic-damage behavior of
nanoparticle/epoxy nanocomposites with moisture content. For this, a long
short-term memory network is trained using a combined framework of a sampling
technique and a perturbation method. The training framework, along with the
training data generated by an experimentally validated
viscoelastic-viscoplastic model, enables the DL model to accurately capture the
rate-dependent stress-strain relationship and consistent tangent moduli. In
addition, the DL-based constitutive model is implemented into finite element
analysis. Finite element simulations are performed to study the effect of load
rate and moisture content on the force-displacement response of nanoparticle/
epoxy samples. Numerical examples show that the computational efficiency of the
DL model depends on the loading condition and is significantly higher than the
conventional constitutive model. Furthermore, comparing numerical results and
experimental data demonstrates good agreement with different nanoparticle and
moisture contents.
( 2
min )
Recent years have witnessed a growth in mathematics for deep learning--which
seeks a deeper understanding of the concepts of deep learning with mathematics
and explores how to make it more robust--and deep learning for mathematics,
where deep learning algorithms are used to solve problems in mathematics. The
latter has popularised the field of scientific machine learning where deep
learning is applied to problems in scientific computing. Specifically, more and
more neural network architectures have been developed to solve specific classes
of partial differential equations (PDEs). Such methods exploit properties that
are inherent to PDEs and thus solve the PDEs better than standard feed-forward
neural networks, recurrent neural networks, or convolutional neural networks.
This has had a great impact in the area of mathematical modeling where
parametric PDEs are widely used to model most natural and physical processes
arising in science and engineering. In this work, we review such methods as
well as their extensions for parametric studies and for solving the related
inverse problems. We equally proceed to show their relevance in some industrial
applications.
( 2
min )
We characterise the learning of a mixture of two clouds of data points with
generic centroids via empirical risk minimisation in the high dimensional
regime, under the assumptions of generic convex loss and convex regularisation.
Each cloud of data points is obtained by sampling from a possibly uncountable
superposition of Gaussian distributions, whose variance has a generic
probability density $\varrho$. Our analysis covers therefore a large family of
data distributions, including the case of power-law-tailed distributions with
no covariance. We study the generalisation performance of the obtained
estimator, we analyse the role of regularisation, and the dependence of the
separability transition on the distribution scale parameters.
( 2
min )
This work concerns controlling the false discovery rate (FDR) in networks
under communication constraints. We present sample-and-forward, a flexible and
communication-efficient version of the Benjamini-Hochberg (BH) procedure for
multihop networks with general topologies. Our method evidences that the nodes
in a network do not need to communicate p-values to each other to achieve a
decent statistical power under the global FDR control constraint. Consider a
network with a total of $m$ p-values, our method consists of first sampling the
(empirical) CDF of the p-values at each node and then forwarding
$\mathcal{O}(\log m)$ bits to its neighbors. Under the same assumptions as for
the original BH procedure, our method has both the provable finite-sample FDR
control as well as competitive empirical detection power, even with a few
samples at each node. We provide an asymptotic analysis of power under a
mixture model assumption on the p-values.
( 2
min )
We characterise the learning of a mixture of two clouds of data points with
generic centroids via empirical risk minimisation in the high dimensional
regime, under the assumptions of generic convex loss and convex regularisation.
Each cloud of data points is obtained by sampling from a possibly uncountable
superposition of Gaussian distributions, whose variance has a generic
probability density $\varrho$. Our analysis covers therefore a large family of
data distributions, including the case of power-law-tailed distributions with
no covariance. We study the generalisation performance of the obtained
estimator, we analyse the role of regularisation, and the dependence of the
separability transition on the distribution scale parameters.
( 2
min )
This study explores the number of neurons required for a Rectified Linear
Unit (ReLU) neural network to approximate multivariate monomials. We establish
an exponential lower bound on the complexity of any shallow network
approximating the product function over a general compact domain. We also
demonstrate this lower bound doesn't apply to normalized Lipschitz monomials
over the unit cube. These findings suggest that shallow ReLU networks
experience the curse of dimensionality when expressing functions with a
Lipschitz parameter scaling with the dimension of the input, and that the
expressive power of neural networks is more dependent on their depth rather
than overall complexity.
( 2
min )
We present a novel framework for distributionally robust optimization (DRO),
called cost-aware DRO (CADRO). The key idea of CADRO is to exploit the cost
structure in the design of the ambiguity set to reduce conservatism.
Particularly, the set specifically constrains the worst-case distribution along
the direction in which the expected cost of an approximate solution increases
most rapidly. We prove that CADRO provides both a high-confidence upper bound
and a consistent estimator of the out-of-sample expected cost, and show
empirically that it produces solutions that are substantially less conservative
than existing DRO methods, while providing the same guarantees.
( 2
min )
Denoising diffusion models are a class of generative models which have
recently achieved state-of-the-art results across many domains. Gradual noise
is added to the data using a diffusion process, which transforms the data
distribution into a Gaussian. Samples from the generative model are then
obtained by simulating an approximation of the time reversal of this diffusion
initialized by Gaussian samples. Recent research has explored adapting
diffusion models for sampling and inference tasks. In this paper, we leverage
known connections to stochastic control akin to the F\"ollmer drift to extend
established neural network approximation results for the F\"ollmer drift to
denoising diffusion models and samplers.
( 2
min )
A common pipeline in learning-based control is to iteratively estimate a
model of system dynamics, and apply a trajectory optimization algorithm -
e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This
paper conducts a rigorous analysis of a simplified variant of this strategy for
general nonlinear systems. We analyze an algorithm which iterates between
estimating local linear models of nonlinear system dynamics and performing
$\mathtt{iLQR}$-like policy updates. We demonstrate that this algorithm attains
sample complexity polynomial in relevant problem parameters, and, by
synthesizing locally stabilizing gains, overcomes exponential dependence in
problem horizon. Experimental results validate the performance of our
algorithm, and compare to natural deep-learning baselines.
( 2
min )
ChatGPT is a sophisticated language model that has taken the world by storm. With its advanced natural language processing capabilities and…
( 11
min )
Some reactions to the latest AI news and developments, along with some AI-generated artwork. Follow the channel, to get updates on posts.
( 13
min )
Leo Anthony Celi invites industry to broaden its focus in gathering and analyzing clinical data for every population.
( 9
min )
Announcements LLM success depends on quality, transparent data Everyone from writers to coders wonder if their job is in jeopardy as prognosticators say generative AI tools will take over business in the coming years. Of course, these large language model chatbots are still unreliable, and certainly can’t be trusted to complete jobs as well as… Read More »DSC Weekly 16 May 2023 – LLM success depends on quality, transparent data
The post DSC Weekly 16 May 2023 – LLM success depends on quality, transparent data appeared first on Data Science Central.
( 19
min )
Image sourced from striim.com There’s one universal truth for every modern organization. It doesn’t matter whether you’re starting a business or already established: to succeed, you need data. Of course, not just any data will do. For strong data-driven decision-making, you also need the best insights. Thankfully, due to data analytics tools, businesses of all… Read More »6 Reasons Real-Time Data Analytics is Beneficial for Your Business
The post 6 Reasons Real-Time Data Analytics is Beneficial for Your Business appeared first on Data Science Central.
( 24
min )
In today’s technological world, data is everything. It can inform our marketing decisions, improve product creation, boost internal processes, and more. For an online business, having the best possible data is key to success. But simply having data isn’t enough. To obtain useful information, you need to understand your data. That’s where web data analytics… Read More »5 Ways to Use Analytics to Inform Website Development Decisions
The post 5 Ways to Use Analytics to Inform Website Development Decisions appeared first on Data Science Central.
( 24
min )
During the past decade, the publishing industry has undergone significant transformations due to the development of digital platforms and the widespread availability of user-generated content. Although these advancements have enabled a greater availability of information and a more diverse perspective, they have also presented challenges when it comes to ensuring that the content adheres to… Read More »Publishing Industry: The Extreme Crucial Role of AI in Content Moderation
The post Publishing Industry: The Extreme Crucial Role of AI in Content Moderation appeared first on Data Science Central.
( 22
min )
Today we are excited to announce that Together Computer’s GPT-NeoXT-Chat-Base-20B language foundation model is available for customers using Amazon SageMaker JumpStart. GPT-NeoXT-Chat-Base-20B is an open-source model to build conversational bots. You can easily try out this model and use it with JumpStart. JumpStart is the machine learning (ML) hub of Amazon SageMaker that provides access […]
( 12
min )
This research was accepted by the IEEE/ACM International Conference on Software Engineering (ICSE), which is a forum for researchers, practitioners, and educators to gather, present, and discuss the most recent innovations, trends, experiences, and issues in the field of software engineering. The Microsoft 365 Systems Innovation research group has a paper accepted at the 45th […]
The post Large-language models for automatic cloud incident management appeared first on Microsoft Research.
( 11
min )
Ten thousand years after the last woolly mammoths vanished with the last Ice Age, a team of computational biologists is on a mission to bring them back within five years. Led by synthetic biology pioneer George Church, Colossal Biosciences is also seeking to return the dodo bird and Tasmanian tiger, as well as help save Read article >
( 7
min )
Chip manufacturing is an “ideal application” for NVIDIA accelerated and AI computing, NVIDIA founder and CEO Jensen Huang said Tuesday. Detailing how the latest advancements in computing are accelerating “the world’s most important industry,” Huang spoke at ITF World 2023 semiconductor conference in Antwerp, Belgium. Huang delivered his remarks via video to a gathering of Read article >
( 7
min )
Data Science is a popular as well as vast field; till date, there are a lot of opportunities in this field, and most people, whether they…
( 25
min )
Introduction
( 17
min )
This is a guest post co-authored by Nafi Ahmet Turgut, Mutlu Polatcan, Pınar Baki, Mehmet İkbal Özmen, Hasan Burak Yel, and Hamza Akyıldız from Getir. Getir is the pioneer of ultrafast grocery delivery. The tech company has revolutionized last-mile delivery with its “groceries in minutes” delivery proposition. Getir was founded in 2015 and operates in […]
( 8
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. To make it simpler to evaluate the capabilities of Amazon Textract, we have launched a new Bulk Document Uploader feature on the Amazon Textract console that enables you to quickly process your own set of […]
( 7
min )
$\partial\mathbb{B}$ nets are differentiable neural networks that learn
discrete boolean-valued functions by gradient descent. $\partial\mathbb{B}$
nets have two semantically equivalent aspects: a differentiable soft-net, with
real weights, and a non-differentiable hard-net, with boolean weights. We train
the soft-net by backpropagation and then `harden' the learned weights to yield
boolean weights that bind with the hard-net. The result is a learned discrete
function. `Hardening' involves no loss of accuracy, unlike existing approaches
to neural network binarization. Preliminary experiments demonstrate that
$\partial\mathbb{B}$ nets achieve comparable performance on standard machine
learning problems yet are compact (due to 1-bit weights) and interpretable (due
to the logical nature of the learnt functions).
( 2
min )
Understanding material surfaces and interfaces is vital in applications like
catalysis or electronics. Ab initio simulations, combining energies from
electronic structure with statistical mechanics, can, in principle, predict the
structure of material surfaces as a function of thermodynamic variables.
However, accurate energy simulations are prohibitive when coupled to the vast
phase space that must be statistically sampled. Here, we present a bi-faceted
computational loop to predict surface phase diagrams of multi-component
materials that accelerates both the energy scoring and statistical sampling
methods. Fast, scalable, and data-efficient machine learning interatomic
potentials are trained on high-throughput density-functional theory
calculations through closed-loop active learning. Markov-chain Monte Carlo
sampling in the semi-grand canonical ensemble is enabled by using virtual
surface sites. The predicted surfaces for GaN(0001) and SrTiO3(001) are in
agreement with past work and suggest that the proposed strategy can model
complex material surfaces and discover previously unreported surface
terminations.
( 2
min )
Pre-trained language models (PLMs) are known to improve the generalization
performance of natural language understanding models by leveraging large
amounts of data during the pre-training phase. However, the out-of-distribution
(OOD) generalization problem remains a challenge in many NLP tasks, limiting
the real-world deployment of these methods. This paper presents the first
attempt at creating a unified benchmark named \method for evaluating OOD
robustness in NLP models, highlighting the importance of OOD robustness and
providing insights on how to measure the robustness of a model and how to
improve it. The benchmark includes 13 publicly available datasets for OOD
testing, and evaluations are conducted on 8 classic NLP tasks over 21 popularly
used PLMs, including GPT-3 and GPT-3.5. Our findings confirm the need for
improved OOD accuracy in NLP tasks, as significant performance degradation was
observed in all settings compared to in-distribution (ID) accuracy.
( 2
min )
Given data ${\rm X}\in\mathbb{R}^{n\times d}$ and labels
$\mathbf{y}\in\mathbb{R}^{n}$ the goal is find $\mathbf{w}\in\mathbb{R}^d$ to
minimize $\Vert{\rm X}\mathbf{w}-\mathbf{y}\Vert^2$. We give a polynomial
algorithm that, \emph{oblivious to $\mathbf{y}$}, throws out $n/(d+\sqrt{n})$
data points and is a $(1+d/n)$-approximation to optimal in expectation. The
motivation is tight approximation with reduced label complexity (number of
labels revealed). We reduce label complexity by $\Omega(\sqrt{n})$. Open
question: Can label complexity be reduced by $\Omega(n)$ with tight
$(1+d/n)$-approximation?
( 2
min )
Breast cancer early detection is crucial for improving patient outcomes. The
Institut Catal\`a de la Salut (ICS) has launched the DigiPatICS project to
develop and implement artificial intelligence algorithms to assist with the
diagnosis of cancer. In this paper, we propose a new approach for facing the
color normalization problem in HER2-stained histopathological images of breast
cancer tissue, posed as an style transfer problem. We combine the Color
Deconvolution technique with the Pix2Pix GAN network to present a novel
approach to correct the color variations between different HER2 stain brands.
Our approach focuses on maintaining the HER2 score of the cells in the
transformed images, which is crucial for the HER2 analysis. Results demonstrate
that our final model outperforms the state-of-the-art image style transfer
methods in maintaining the cell classes in the transformed images and is as
effective as them in generating realistic images.
( 2
min )
Over the last few years, the number of works about deep learning applied to
the medical field has increased enormously. The necessity of a rigorous
assessment of these models is required to explain these results to all people
involved in medical exams. A recent field in the machine learning area is
explainable artificial intelligence, also known as XAI, which targets to
explain the results of such black box models to permit the desired assessment.
This survey analyses several recent studies in the XAI field applied to medical
diagnosis research, allowing some explainability of the machine learning
results in several different diseases, such as cancers and COVID-19.
( 2
min )
This paper proposes an online, provably robust, and scalable Bayesian
approach for changepoint detection. The resulting algorithm has key advantages
over previous work: it provides provable robustness by leveraging the
generalised Bayesian perspective, and also addresses the scalability issues of
previous attempts. Specifically, the proposed generalised Bayesian formalism
leads to conjugate posteriors whose parameters are available in closed form by
leveraging diffusion score matching. The resulting algorithm is exact, can be
updated through simple algebra, and is more than 10 times faster than its
closest competitor.
( 2
min )
In the misspecified kernel ridge regression problem, researchers usually
assume the underground true function $f_{\rho}^{*} \in [\mathcal{H}]^{s}$, a
less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS)
$\mathcal{H}$ for some $s\in (0,1)$. The existing minimax optimal results
require $\|f_{\rho}^{*}\|_{L^{\infty}}
\alpha_{0}$ where $\alpha_{0}\in (0,1)$ is the embedding index, a constant
depending on $\mathcal{H}$. Whether the KRR is optimal for all $s\in (0,1)$ is
an outstanding problem lasting for years. In this paper, we show that KRR is
minimax optimal for any $s\in (0,1)$ when the $\mathcal{H}$ is a Sobolev RKHS.
( 2
min )
Anonymization techniques based on obfuscating the quasi-identifiers by means
of value generalization hierarchies are widely used to achieve preset levels of
privacy. To prevent different types of attacks against database privacy it is
necessary to apply several anonymization techniques beyond the classical
k-anonymity or $\ell$-diversity. However, the application of these methods is
directly connected to a reduction of their utility in prediction and decision
making tasks. In this work we study four classical machine learning methods
currently used for classification purposes in order to analyze the results as a
function of the anonymization techniques applied and the parameters selected
for each of them. The performance of these models is studied when varying the
value of k for k-anonymity and additional tools such as $\ell$-diversity,
t-closeness and $\delta$-disclosure privacy are also deployed on the well-known
adult dataset.
( 2
min )
The Schr\"odinger bridge problem (SBP) is gaining increasing attention in
generative modeling and showing promising potential even in comparison with the
score-based generative models (SGMs). SBP can be interpreted as an
entropy-regularized optimal transport problem, which conducts projections onto
every other marginal alternatingly. However, in practice, only approximated
projections are accessible and their convergence is not well understood. To
fill this gap, we present a first convergence analysis of the Schr\"odinger
bridge algorithm based on approximated projections. As for its practical
applications, we apply SBP to probabilistic time series imputation by
generating missing values conditioned on observed data. We show that optimizing
the transport cost improves the performance and the proposed algorithm achieves
the state-of-the-art result in healthcare and environmental data while
exhibiting the advantage of exploring both temporal and feature patterns in
probabilistic time series imputation.
( 2
min )
Recent works successfully leveraged Large Language Models' (LLM) abilities to
capture abstract knowledge about world's physics to solve decision-making
problems. Yet, the alignment between LLMs' knowledge and the environment can be
wrong and limit functional competence due to lack of grounding. In this paper,
we study an approach (named GLAM) to achieve this alignment through functional
grounding: we consider an agent using an LLM as a policy that is progressively
updated as the agent interacts with the environment, leveraging online
Reinforcement Learning to improve its performance to solve goals. Using an
interactive textual environment designed to study higher-level forms of
functional grounding, and a set of spatial and navigation tasks, we study
several scientific questions: 1) Can LLMs boost sample efficiency for online
learning of various RL tasks? 2) How can it boost different forms of
generalization? 3) What is the impact of online learning? We study these
questions by functionally grounding several variants (size, architecture) of
FLAN-T5.
( 2
min )
This paper discusses the feasibility of continuously training the CLIP model
through streaming data. Then, by tracking the directional changes of the
representation vectors in the continuously updated CLIP model, we explore and
summarize these spatial variations as Spatial Disorder (SD), which can be
divided into Intra-modal Rotation and Inter-modal Deviation. Moreover, we
demonstrate how intra-modal rotation and inter-modal deviation lead to a
performance decline for CLIP on cross-modal retrieval tasks in both empirically
and theoretically. To alleviate the spatial disorder, we propose a simple yet
effective continual learning framework Mod-X: Maintain off-diagonal
information-matriX. The experiments (in Section \ref{method}, \ref{experiments}
and Appendix \ref{Appendix_to_experiments}) on commonly used datasets with
different scales and scopes have illustrated the effectiveness of our method.
( 2
min )
Critical decisions like loan approvals, medical interventions, and college
admissions are guided by predictions made in the presence of uncertainty. In
this paper, we prove that uncertainty has a disparate impact. While it imparts
errors across all demographic groups, the types of errors vary systematically:
Groups with higher average outcomes are typically assigned higher false
positive rates, while those with lower average outcomes are assigned higher
false negative rates. We show that additional data acquisition can eliminate
the disparity and broaden access to opportunity. The strategy, which we call
Affirmative Information, could stand as an alternative to Affirmative Action.
( 2
min )
In this work, we focus on the communication aspect of decentralized learning,
which involves multiple agents training a shared machine learning model using
decentralized stochastic gradient descent (D-SGD) over distributed data. In
particular, we investigate the impact of broadcast transmission and
probabilistic random access policy on the convergence performance of D-SGD,
considering the broadcast nature of wireless channels and the link dynamics in
the communication topology. Our results demonstrate that optimizing the access
probability to maximize the expected number of successful links is a highly
effective strategy for accelerating the system convergence.
( 2
min )
Existing question answering methods often assume that the input content
(e.g., documents or videos) is always accessible to solve the task.
Alternatively, memory networks were introduced to mimic the human process of
incremental comprehension and compression of the information in a
fixed-capacity memory. However, these models only learn how to maintain memory
by backpropagating errors in the answers through the entire network. Instead,
it has been suggested that humans have effective mechanisms to boost their
memorization capacities, such as rehearsal and anticipation. Drawing
inspiration from these, we propose a memory model that performs rehearsal and
anticipation while processing inputs to memorize important information for
solving question answering tasks from streaming data. The proposed mechanisms
are applied self-supervised during training through masked modeling tasks
focused on coreference information. We validate our model on a short-sequence
(bAbI) dataset as well as large-sequence textual (NarrativeQA) and video
(ActivityNet-QA) question answering datasets, where it achieves substantial
improvements over previous memory network approaches. Furthermore, our ablation
study confirms the proposed mechanisms' importance for memory models.
( 2
min )
Diffusion models were initially developed for text-to-image generation and
are now being utilized to generate high quality synthetic images. Preceded by
GANs, diffusion models have shown impressive results using various evaluation
metrics. However, commonly used metrics such as FID and IS are not suitable for
determining whether diffusion models are simply reproducing the training
images. Here we train StyleGAN and diffusion models, using BRATS20 and BRATS21
datasets, to synthesize brain tumor images, and measure the correlation between
the synthetic images and all training images. Our results show that diffusion
models are much more likely to memorize the training images, especially for
small datasets. Researchers should be careful when using diffusion models for
medical imaging, if the final goal is to share the synthetic images.
( 2
min )
We present GPS++, a hybrid Message Passing Neural Network / Graph Transformer
model for molecular property prediction. Our model integrates a well-tuned
local message passing component and biased global attention with other key
ideas from prior literature to achieve state-of-the-art results on large-scale
molecular dataset PCQM4Mv2. Through a thorough ablation study we highlight the
impact of individual components and find that nearly all of the model's
performance can be maintained without any use of global self-attention, showing
that message passing is still a competitive approach for 3D molecular property
prediction despite the recent dominance of graph transformers. We also find
that our approach is significantly more accurate than prior art when 3D
positional information is not available.
( 2
min )
This field case study aims to address the challenge of accurately predicting
petrophysical properties in heterogeneous reservoir formations, which can
significantly impact reservoir performance predictions. The study employed
three machine learning algorithms, namely Artificial Neural Network (ANN),
Random Forest Classifier (RFC), and Support Vector Machine (SVM), to predict
permeability log from conventional logs and match it with core data. The
primary objective of this study was to compare the effectiveness of the three
machine learning algorithms in predicting permeability and determine the
optimal prediction method. The study utilized the Flow Zone Indicator (FZI)
rock typing technique to understand the factors influencing reservoir quality.
The findings will be used to improve reservoir simulation and locate future
wells more accurately. The study concluded that the FZI approach and machine
learning algorithms are effective in predicting permeability log and improving
reservoir performance predictions.
( 2
min )
Classical reinforcement learning (RL) aims to optimize the expected
cumulative reward. In this work, we consider the RL setting where the goal is
to optimize the quantile of the cumulative reward. We parameterize the policy
controlling actions by neural networks, and propose a novel policy gradient
algorithm called Quantile-Based Policy Optimization (QPO) and its variant
Quantile-Based Proximal Policy Optimization (QPPO) for solving deep RL problems
with quantile objectives. QPO uses two coupled iterations running at different
timescales for simultaneously updating quantiles and policy parameters, whereas
QPPO is an off-policy version of QPO that allows multiple updates of parameters
during one simulation episode, leading to improved algorithm efficiency. Our
numerical results indicate that the proposed algorithms outperform the existing
baseline algorithms under the quantile criterion.
( 2
min )
With the publication of DINO, a variant of the Detection Transformer (DETR),
Detection Transformers are breaking the record in the object detection
benchmark with the merits of their end-to-end design and scalability. However,
the extension of DETR to oriented object detection has not been thoroughly
studied although more benefits from its end-to-end architecture are expected
such as removing NMS and anchor-related costs. In this paper, we propose a
first strong DINO-based baseline for oriented object detection. We found that
straightforward employment of DETRs for oriented object detection does not
guarantee non-duplicate prediction, and propose a simple cost to mitigate this.
Furthermore, we introduce a novel denoising strategy that uses Hungarian
matching to filter redundant noised queries and query alignment to preserve
matching consistency between Transformer decoder layers. Our proposed model
outperforms previous rotated DETRs and other counterparts, achieving
state-of-the-art performance in DOTA-v1.0/v1.5/v2.0, and DIOR-R benchmarks.
( 2
min )
We propose the GFlowNets with Human Feedback (GFlowHF) framework to improve
the exploration ability when training AI models. For tasks where the reward is
unknown, we fit the reward function through human evaluations on different
trajectories. The goal of GFlowHF is to learn a policy that is strictly
proportional to human ratings, instead of only focusing on human favorite
ratings like RLHF. Experiments show that GFlowHF can achieve better exploration
ability than RLHF.
( 2
min )
We present the Hierarchical Mixture Networks (HINT), a model family for
efficient and accurate coherent forecasting. We specialize the networks on the
task via a multivariate mixture optimized with composite likelihood and made
coherent via bootstrap reconciliation. Additionally, we robustify the networks
to stark time series scale variations, incorporating normalized feature
extraction and recomposition of output scales within their architecture. We
demonstrate 8% sCRPS improved accuracy across five datasets compared to the
existing state-of-the-art. We conduct ablation studies on our model's
components and extensively investigate the theoretical properties of the
multivariate mixture. HINT's code is available at this
https://github.com/Nixtla/neuralforecast.
( 2
min )
This work proposes the use of 3D convolutional variational autoencoders
(CVAEs) to trace the changes and symptomatology produced by neurodegeneration
in Parkinson's disease (PD). In this work, we present a novel approach to
detect and quantify changes in dopamine transporter (DaT) concentration and its
spatial patterns using 3D CVAEs on Ioflupane (FPCIT) imaging. Our approach
leverages the power of deep learning to learn a low-dimensional representation
of the brain imaging data, which then is linked to different symptom categories
using regression algorithms. We demonstrate the effectiveness of our approach
on a dataset of PD patients and healthy controls, and show that general
symptomatology (UPDRS) is linked to a d-dimensional decomposition via the CVAE
with R2>0.25. Our work shows the potential of representation learning not only
in early diagnosis but in understanding neurodegeneration processes and
symptomatology.
( 2
min )
This paper proposes an online, provably robust, and scalable Bayesian
approach for changepoint detection. The resulting algorithm has key advantages
over previous work: it provides provable robustness by leveraging the
generalised Bayesian perspective, and also addresses the scalability issues of
previous attempts. Specifically, the proposed generalised Bayesian formalism
leads to conjugate posteriors whose parameters are available in closed form by
leveraging diffusion score matching. The resulting algorithm is exact, can be
updated through simple algebra, and is more than 10 times faster than its
closest competitor.
( 2
min )
We present the Hierarchical Mixture Networks (HINT), a model family for
efficient and accurate coherent forecasting. We specialize the networks on the
task via a multivariate mixture optimized with composite likelihood and made
coherent via bootstrap reconciliation. Additionally, we robustify the networks
to stark time series scale variations, incorporating normalized feature
extraction and recomposition of output scales within their architecture. We
demonstrate 8% sCRPS improved accuracy across five datasets compared to the
existing state-of-the-art. We conduct ablation studies on our model's
components and extensively investigate the theoretical properties of the
multivariate mixture. HINT's code is available at this
https://github.com/Nixtla/neuralforecast.
( 2
min )
Critical decisions like loan approvals, medical interventions, and college
admissions are guided by predictions made in the presence of uncertainty. In
this paper, we prove that uncertainty has a disparate impact. While it imparts
errors across all demographic groups, the types of errors vary systematically:
Groups with higher average outcomes are typically assigned higher false
positive rates, while those with lower average outcomes are assigned higher
false negative rates. We show that additional data acquisition can eliminate
the disparity and broaden access to opportunity. The strategy, which we call
Affirmative Information, could stand as an alternative to Affirmative Action.
( 2
min )
In Part 1 of the series “AI for Everyone: Learn How to Think Like a Data Scientist”, we discussed that for AI to reach its full economic and societal potential, we must educate and empower everyone to actively participate in the design, application, and management of meaningful, relevant, and responsible AI. We discussed the role… Read More »AI for Everyone: Learn How to Think Like a Data Scientist – Part 2
The post AI for Everyone: Learn How to Think Like a Data Scientist – Part 2 appeared first on Data Science Central.
( 19
min )
Deep machine learning models including Convolutional Neural Networks (CNN)
have been successful in the detection of Mild Cognitive Impairment (MCI) using
medical images, questionnaires, and videos. This paper proposes a novel
Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to
distinguish MCI from those with normal cognition by analyzing facial features.
The data comes from the I-CONECT, a behavioral intervention trial aimed at
improving cognitive function by providing frequent video chats. MC-ViViT
extracts spatiotemporal features of videos in one branch and augments
representations by the MC module. The I-CONECT dataset is challenging as the
dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which
impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy
and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE
loss to address the imbalanced problem. Our experimental results on the
I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a
high accuracy of 90.63\% accuracy on some of the interview videos.
( 2
min )
The use of Artificial Intelligence (AI) has become increasingly prevalent in the modern world, seeing its potential to drastically improve…
( 25
min )
Amazon SageMaker comes with two options to spin up fully managed notebooks for exploring data and building machine learning (ML) models. The first option is fast start, collaborative notebooks accessible within Amazon SageMaker Studio—a fully integrated development environment (IDE) for machine learning. You can quickly launch notebooks in Studio, easily dial up or down the […]
( 9
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization. Keywords or natural language questions can be […]
( 7
min )
EuroSys 2023 is the premier systems conference in Europe, and 2023 marks its 18th edition. Sponsored by ACM SIGOPS Europe and hosted May 8 to May 12, the conference covers a wide range of topics, including operating systems, real-time and networked systems, storage and middleware, and distributed, parallel, and embedded computing, as well as their […]
The post Microsoft at EuroSys 2023: Systems innovation across the stack to help support an easier, faster, safer, and smarter cloud appeared first on Microsoft Research.
( 11
min )
We consider the upper confidence bound strategy for Gaussian multi-armed
bandits with known control horizon sizes $N$ and build its limiting description
with a system of stochastic differential equations and ordinary differential
equations. Rewards for the arms are assumed to have unknown expected values and
known variances. A set of Monte-Carlo simulations was performed for the case of
close distributions of rewards, when mean rewards differ by the magnitude of
order $N^{-1/2}$, as it yields the highest normalized regret, to verify the
validity of the obtained description. The minimal size of the control horizon
when the normalized regret is not noticeably larger than maximum possible was
estimated.
( 2
min )
Due to the huge amount of parameters, fine-tuning of pretrained language
models (PLMs) is prone to overfitting in the low resource scenarios. In this
work, we present a novel method that operates on the hidden representations of
a PLM to reduce overfitting. During fine-tuning, our method inserts random
autoencoders between the hidden layers of a PLM, which transform activations
from the previous layers into multi-view compressed representations before
feeding them into the upper layers. The autoencoders are plugged out after
fine-tuning, so our method does not add extra parameters or increase
computation cost during inference. Our method demonstrates promising
performance improvement across a wide range of sequence- and token-level
low-resource NLP tasks.
( 2
min )
The Nash Equilibrium (NE) estimation in bidding games of electricity markets
is the key concern of both generation companies (GENCOs) for bidding strategy
optimization and the Independent System Operator (ISO) for market surveillance.
However, existing methods for NE estimation in emerging modern electricity
markets (FEM) are inaccurate and inefficient because the priori knowledge of
bidding strategies before any environment changes, such as load demand
variations, network congestion, and modifications of market design, is not
fully utilized. In this paper, a Bayes-adaptive Markov Decision Process in FEM
(BAMDP-FEM) is therefore developed to model the GENCOs' bidding strategy
optimization considering the priori knowledge. A novel Multi-Agent Generative
Adversarial Imitation Learning algorithm (MAGAIL-FEM) is then proposed to
enable GENCOs to learn simultaneously from priori knowledge and interactions
with changing environments. The obtained NE is a Bayesian Nash Equilibrium
(BNE) with priori knowledge transferred from the previous environment. In the
case study, the superiority of this proposed algorithm in terms of convergence
speed compared with conventional methods is verified. It is concluded that the
optimal bidding strategies in the obtained BNE can always lead to more profits
than NE due to the effective learning from the priori knowledge. Also, BNE is
more accurate and consistent with situations in real-world markets.
( 2
min )
Implicitly Normalized Forecaster (online mirror descent with Tsallis entropy
as prox-function) is known to be an optimal algorithm for adversarial
multi-armed problems (MAB). However, most of the complexity results rely on
bounded rewards or other restrictive assumptions. Recently closely related
best-of-both-worlds algorithm were proposed for both adversarial and stochastic
heavy-tailed MAB settings. This algorithm is known to be optimal in both
settings, but fails to exploit data fully. In this paper, we propose Implicitly
Normalized Forecaster with clipping for MAB problems with heavy-tailed
distribution on rewards. We derive convergence results under mild assumptions
on rewards distribution and show that the proposed method is optimal for both
linear and non-linear heavy-tailed stochastic MAB problems. Also we show that
algorithm usually performs better compared to best-of-two-worlds algorithm.
( 2
min )
Neural ordinary differential equations (neural ODEs) are a popular family of
continuous-depth deep learning models. In this work, we consider a large family
of parameterized ODEs with continuous-in-time parameters, which include
time-dependent neural ODEs. We derive a generalization bound for this class by
a Lipschitz-based argument. By leveraging the analogy between neural ODEs and
deep residual networks, our approach yields in particular a generalization
bound for a class of deep residual networks. The bound involves the magnitude
of the difference between successive weight matrices. We illustrate numerically
how this quantity affects the generalization capability of neural networks.
( 2
min )
Hyperparameter optimization (HPO) is a powerful technique for automating the
tuning of machine learning (ML) models. However, in many real-world
applications, accuracy is only one of multiple performance criteria that must
be considered. Optimizing these objectives simultaneously on a complex and
diverse search space remains a challenging task. In this paper, we propose
MO-DEHB, an effective and flexible multi-objective (MO) optimizer that extends
the recent evolutionary Hyperband method DEHB. We validate the performance of
MO-DEHB using a comprehensive suite of 15 benchmarks consisting of diverse and
challenging MO problems, including HPO, neural architecture search (NAS), and
joint NAS and HPO, with objectives including accuracy, latency and algorithmic
fairness. A comparative study against state-of-the-art MO optimizers
demonstrates that MO-DEHB clearly achieves the best performance across our 15
benchmarks.
( 2
min )
We propose a node clustering method for time-varying graphs based on the
assumption that the cluster labels are changed smoothly over time. Clustering
is one of the fundamental tasks in many science and engineering fields
including signal processing, machine learning, and data mining. Although most
existing studies focus on the clustering of nodes in static graphs, we often
encounter time-varying graphs for time-series data, e.g., social networks,
brain functional connectivity, and point clouds. In this paper, we formulate a
node clustering of time-varying graphs as an optimization problem based on
spectral clustering, with a smoothness constraint of the node labels. We solve
the problem with a primal-dual splitting algorithm. Experiments on synthetic
and real-world time-varying graphs are performed to validate the effectiveness
of the proposed approach.
( 2
min )
As a consequence of the increasing influence of machine learning on our
lives, everyone needs competencies to understand corresponding phenomena, but
also to get involved in shaping our world and making informed decisions
regarding the influences on our society. Therefore, in K-12 education, students
need to learn about core ideas and principles of machine learning. However, for
this target group, achieving all of the aforementioned goals presents an
enormous challenge. To this end, we present a teaching concept that combines a
playful and accessible unplugged approach focusing on conceptual understanding
with empowering students to actively apply machine learning methods and reflect
their influence on society, building upon decision tree learning.
( 2
min )
We study multi-agent reinforcement learning in the setting of episodic Markov
decision processes, where multiple agents cooperate via communication through a
central server. We propose a provably efficient algorithm based on value
iteration that enable asynchronous communication while ensuring the advantage
of cooperation with low communication overhead. With linear function
approximation, we prove that our algorithm enjoys an
$\tilde{\mathcal{O}}(d^{3/2}H^2\sqrt{K})$ regret with
$\tilde{\mathcal{O}}(dHM^2)$ communication complexity, where $d$ is the feature
dimension, $H$ is the horizon length, $M$ is the total number of agents, and
$K$ is the total number of episodes. We also provide a lower bound showing that
a minimal $\Omega(dM)$ communication complexity is required to improve the
performance through collaboration.
( 2
min )
As a medical doctor in Nigeria, Tobi Olatunji knows the stress of practicing in Africa’s busy hospitals. As a machine-learning scientist, he has a prescription for it. “I worked at one of West Africa’s largest hospitals, where I would routinely see more than 30 patients a day — it’s a very hard job,” said Olatunji. Read article >
( 6
min )
Make gaming a priority this GFN Thursday — time’s running out to upgrade to a GeForce NOW Priority six-month membership at 40% off the normal price. Find out how new Priority members are using the cloud to get their game on. Plus, the week brings updates for some of the hottest games in the GeForce Read article >
( 5
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )